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ABSTRACT. Oriented closed curves on an orientable surface with boundary are described 
up to continuous deformation by reduced cyclic words in the generators of the fundamen- 
tal group and their inverses. By self-intersection number one means the minimum number 
of transversal self-intersection points of representatives of the class. We prove that if a class 
is chosen at random from among all classes of m letters, then for large m the distribution 
of the self-intersection number approaches the Gaussian distribution. The theorem was 
strongly suggested by a computer experiment with four million curves producing a very 
nearly Gaussian distribution. 
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1. Introduction 

Oriented closed curves in a surface with boundary are, up to continuous deformation, 
described by reduced cyclic words in a set of free generators of the fundamental group 
and their inverses. (Recall that such words represent the conjugacy classes of the funda- 
mental group.) Given a reduced cyclic word a, define the self-intersection number N(a) 
to be the minimum number of transversal double points among all closed curves repre- 
sented by a. (See Figure[TJ) Fix a positive integer n and consider how the self-intersection 
number N{a) varies over the population T n of all reduced cyclic words of length n. The 
value of N(a) can be as small as 0, but no larger than 0{n 2 ). See 0, |6) for precise re- 
sults concerning the maximum of N(a) for a G T n , and [lH for sharp results on the 
related problem of determining the growth of the number of non self -intersecting closed 
geodesies up to a given length relative to a hyperbolic metric. 

For small values of n, using algorithms in [8J, and |5], we computed the self -intersection 
counts N(a) for all words a G T n (see [4]). Such computations show that, even for rela- 
tively small n, the distribution of N(a) over F n is very nearly Gaussian. (See Figure |2]) 
The purpose of this paper is to prove that as n — > oo the distribution of N(a) over the 
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FIGURE 1. Two representatives of aabb in the doubly punctured plane. 
The second curve has fewest self-intersections in its free homotopy class. 
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FIGURE 2. A histogram showing the distribution of self-intersection num- 
bers over all reduced cyclic words of length 19 in the doubly punctured 
plane. The horizontal coordinate shows the self-intersection count k; the 
vertical coordinate shows the number of cyclic reduced words for which 
the self-intersection number is k. 



population T n , suitably scaled, does indeed approach a Gaussian distribution: 
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Main Theorem. Let E be an orientable, compact surface with boundary and negative Euler 
characteristic x, and set 

m X 2 2 2 X (2 X 2 - 2 X + 1) 

(1) K = Kv = — r flnrf a = (Jy = : -777 ~r 

W 3(2 X -1) S 45(2 X -1) 2 ( X -1) 

Then for any a < bthe proportion of words a £ F n such that 

N(a) — nn 2 
a < n3/ 2 < b 

converges, as n — > oo, to 

rb 



1 f 

/ exp{-x 2 /2<r 2 } dx. 

V2na J a 



2ira 

Observe that the limiting variance a 2 is positive if the Euler characteristic is negative. 
Consequently, the theorem implies that (i) for most words a € T n the self-intersection 
number N(a) is to first order well-approximated by kti 2 ; and (ii) typical variations of 
N(a) from this first-order approximation ("fluctuations") are of size n 3 / 2 . 

It is relatively easy to understand (if not to prove) why the number of self -intersections 
of typical elements of F n should grow like n 2 . Here follows a short heuristic argument: 
consider the lift of a closed curve with minimal self-intersection number in its class to the 
universal cover of the surface S. This lift will cross n images of the fundamental polygon, 
where n is the corresponding word length, and these crossings can be used to partition 
the curve into n nonoverlapping segments in such a way that each segment makes one 
crossing of an image of the fundamental polygon. The self-intersection count for the curve 
is then the number of pairs of these segments whose images in the fundamental polygon 
cross. It is reasonable to guess that for typical classes a G T n (at least when n is large) 
these segments look like a random sample from the set of all such segments, and so the 
law of large numbers then implies that the number of self-intersections should grow like 
n 2 «//2 where k' is the probability that two randomly chosen segments across the funda- 
mental polygon will cross. The difficulty in making this argument precise, of course, is in 
quantifying the sense in which the segments of a typical closed curve look like a random 
sample of segments. The arguments below (see sec. H} wm make this clear. 

The mystery, then, is not why the mean number of self-intersections grows like n 2 , but 
rather why the size of typical fluctuations is of order n 3 / 2 and why the limit distribution 
is Gaussian. This seems to be connected to geometry. If the surface S is equipped with 
a finite-area Riemannian metric of negative curvature, and if the boundary components 
are (closed) geodesies then each free homotopy class contains a unique closed geodesic 
(except for the free homotopy classes corresponding to the punctures). It is therefore 
possible to order the free homotopy classes by the length of the geodesic representative. 
Fix L, and let Gl be the set of all free homotopy classes whose closed geodesies are of 
length < L. The main result of [12J (see also [13]) describes the variation of the self- 
intersection count N(a) as a ranges over the population Gl- 
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Geometric Sampling Theorem. If the Riemannian metric on £ is hyperbolic (i.e., constant- 
curvature -1) then there exists a possibly degenerate probability distribution G onR such that for 
all a < b the proportion of words a G such that 

a< N(a ) + LV^ x)<b 
L 

converges, as L — > oo, to G(b) — G(a). 

The limit distribution is not known, but is likely not Gaussian. The result leaves open 
the possibility that the limit distribution is degenerate (that is, concentrated at a single 
point); if this were the case, then the true order of magnitude of the fluctuations might 
be a fractional power of L. The Geometric Sampling Theorem implies that the typical 
variation in self-intersection count for a closed geodesic chosen randomly according to 
hyperbolic length is of order L. Together with the Main Theorem, this suggests that the 
much larger variations that occur when sampling by word length are (in some sense) due 
to y^n— variations in hyperbolic length over the population T n - 

The Main Theorem can be reformulated in probabilistic language as follows (see Ap- 
pendix[B]for definitions): 

Main Theorem*, Let £ be an orientable, compact surface with boundary and negative Euler 
characteristic %> an d let k and a be defined by Q}. Let N n be the random variable obtained by 
evaluating the self-intersection function N at a randomly chosen a G F n . Then as n — > oo, 

, , N n — n 2 n 

(2) Normal(0, 1) 

an? I z 

where Normal(0, 1) is the standard Gaussian distribution on E and =>• denotes convergence in 
distribution. 



2. Combinatorics of Self-Intersection Counts 



Our analysis is grounded on a purely combinatorial description of the self-intersection 
counts N(a), due to 0, HI, and 10. For an example of this analysis, see AppendixlAl 

Since S has non-empty boundary, its fundamental group 7Ti(S) is free. We will work 
with a generating set of 7ri(£) such that each element has a non-self -intersecting repre- 
sentative (Such a basis is a natural choice to describe self-intersections of free homotopy 
classes.) Denote by Q the set containing the elements of the generating set and their in- 
verses and by g the cardinality of Q. Thus, g = 2 — 2%, where x denotes the Euler char- 
acteristic of S. We shall assume throughout the paper that x < — 1, and so g > 4. It is 
not hard to see that there exists a (non-unique and possibly non-reduced) cyclic word O 
of length g such that 

(1) O contains each element of Q exactly once. 

(2) The surface S can be obtained as follows: label the edges of a polygon with 2g sides, 
alternately (so every other edge is not labelled) with the letters of O and glue edges 
labeled with the same letter without creating Moebius bands. 
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This cyclic word O encodes the intersection and self-intersection structure of free homo- 
topy classes of curves on S. 

Since vri(S) is a free group, the elements of vri(S) can be identified with the reduced 
words (which we will also call strings) in the generators and their inverses. A string is 
joinable if each cyclic permutation of its letters is also a string, that is, if its last letter is not 
the inverse of its first. A reduced cyclic word (also called a necklace ) is an equivalence class 
of joinable strings, where two such strings are considered equivalent if each is a cyclic 
permutation of the other. Denote by S n , J n , and T n the sets of strings, joinable strings, 
and necklaces, respectively, of length n. Since necklaces correspond bijectively with the 
conjugacy classes of the fundamental group, the self-intersection count a h-» N(a) can be 
regarded as a function on the set T n of necklaces. This function pulls back to a function 
on the set J n of joinable strings, which we again denote by N(a), that is constant on 
equivalence classes. By [5] this function has the form 

(3) N(a)= H{a i a,a i a), 

l<i<j<n 

where H = H (O) is a symmetric function with values in {0, 1} on J n x J n and a % a denotes 
the ith cyclic permutation of a. (Note: a 2 also denotes the limiting variance in (H), but it 
will be clear from the context which of the two meanings is in force.) 

To describe the function H in the representation 10, we must explain the cyclic ordering 
of letters. For a cyclic word a (not necessarily reduced), set o(q) = 1 if the letters of a 
occur in cyclic (clockwise) order in O, set o(a) = — 1 if the letters of a occur in reverse 
cyclic (anti-clockwise) order, and set o(a) = otherwise. Consider two (finite or infinite) 
strings, oj = c\c% ■ ■ ■ and oj' = d\d 2 ■ ■ ■ . For each integer k > 2 define functions uu and V}. 
of such pairs (oj, oj') as follows: First, set Uk(u, u') = unless 

(a) both uj and oj' are of length at least k; and 

(b) ci / d\, Cfe 7^ dk, and cj = dj for all 1 < j < k. 

For any pair (uj, uj') such that both (a) and (b) hold, define 

{1 if k = 2, and o(c\diC2d2) ^ 0; 
1 if k > 3, and o{cid\C2) = o(ckdkCk-i)) and 
otherwise. 

Finally, define V2{uj,uj') = for all strings uj,uj', and for k > 3 define Vk(uj, uj') = unless 
both oj and oj' are of length at least k, in which case 

v k (oj,oj') = u k {cic 2 ■ ■ ■ ct,44-i • • • di). 

(Note: The only reason for defining v 2 is to avoid having to write separate sums for the 
functions Vj and Uj in formula © and the arguments to follow.) Observe that both u k 
and v k depend only on the first k letters of their arguments. Furthermore, uj~ and vu 
are defined for arbitrary pairs of strings, finite or infinite; for doubly infinite sequences 
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x = • • • x-ixoxi ■ ■ ■ and y = • • • y-iyoyi • • • we adopt the convention that 

«fc(x, y) = u k (xix 2 ■■■x k , yiy 2 ■ ■■Vk) and 
y) = v k {xix 2 ■■■x k ,, yiy 2 ■ ■■Vk)- 

Proposition 2.1. |5] Let a be a primitive necklace of length n > 2. Unhook a at an arbitrary 
location to obtain a string a* = a\a 2 ■ ■ ■ a n , and let a^a* be the jth cyclic permutation of a*. 
Then 

n n n 

(4) N(a) = J2Y, ^'a*) o^'a*)). 

i=l j=i+l k=2 

3. Proof of the Main Theorem: Strategy 

Except for the exact values (d) of the limiting constants k and a 2 , which of course de- 
pend on the specific form of the functions u k and v k , the conclusions of the Main Theorem 
hold more generally for random variables defined by sums of the form 

n n n 

(5) N(a*) = Y,H EM^a*,^*) 

i=l j=i+l k=2 

where h k are real-valued functions on the space of reduced sequences a* with entries in 
Q satisfying the hypotheses (H0)-(H3) below. The function N extends to necklaces in an 
obvious way: for any necklace a of length n, unhook a at an arbitrary place to obtain 
a joinable string a*, then define N(a) = N(a*). Denote by \ n , \i n , and v n the uniform 
probability distributions on the sets J n , F n , and S n , respectively. 

(HO) Each function h k is symmetric. 

(HI) There exists C < oo such that \h k \<Cfot all k > 1. 

(H2) For each k > 1 the function h k depends only on the first k entries of its arguments. 
(H3) There exist constants C < oo and < (3 < 1 such that for all n > k > 1 and 1 < i < n, 

E Xn \h k (a,a l a)\<Cf3 k 

In view of (H2), the function h k is well-defined for any pair of sequences, finite or infinite, 
provided their lengths are at least k. Hypotheses (H0)-(H2) are clearly satisfied for h k = 
u k +v k , where u k and v k are as in formula ((4]) and u\ = v\ = 0; see Lemma l4!8l of section l431 
for hypothesis (H3). 

Theorem 3.1. Assume that the functions h k satisfy hypotheses (H0)-(H3), and let N(a) be de- 
fined by (0 for all necklaces a of length n. There exist constants k and a 2 (given by equations 
((22) below) such that if F n is the distribution of the random variable (N(a) — n 2 K)/n 3 / 2 under 
the probability measure fj, n , then as n — > oo, 



(6) 



F n ^Normal(0,a 2 ). 
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Formulas for the limiting constants k, a are given (in more general form) in Theorem l5.ll 
below. In section|6] we will show that in the case of particular interest, namely = u^ + Vk 
where u^, are as in Proposition ^. 1[ the constants k and a defined in Theorem l5 . 1 l assume 
the values (Q} given in the statement of the Main Theorem. 

Modulo the proof of Lemma 14.81 and the calculation of the constants a and k, the 
Main Theorem follows directly from Theorem l3.ll The proof of Theorem 13. 1 1 will proceed 
roughly as follows. First we will prove (Lemma I4.2|) that there is a shift-invariant, Markov 
probability measure v on the space Soo of infinite sequences x = x\xi ■ ■ ■ whose marginals 
(that is, the push-forwards under the projection mappings to S n ) are the uniform distri- 
butions v n . Using this representation we will prove, in subsection !4.4[ that when n is large 
the distribution of N(a>) under [x n differs negligibly from the distribution of a related ran- 
dom variable defined on the Markov chain with distribution v. See Proposition 14.71 for a 
precise statement. Theorem 13.11 will then follow from a general limit theorem for certain 
U-statistics of Markov chains (see Theorem 15 J} . 



4. The Associated Markov Chain 

4.1. Necklaces, strings, and joinable strings. Recall that a string is a sequence with en- 
tries in the set Q of generators and their inverses such that no two adjacent entries are 
inverses. A finite string is joinable if its first and last entries are not inverses. The sets of 
length-n strings, joinable strings, and necklaces are denoted by S n ,J n , and T n , respec- 
tively, and the uniform distributions on these sets are denoted by u n ,X n , and fi n . Let A 
be the involutive permutation matrix with rows and columns indexed by Q whose entries 
a(x,y) are 1 if x and y are inverses and otherwise. Let B be the matrix with all entries 1. 
Then for any n > 1, 

\S n \ = 1 T (B- A)™- 1 ! and \J n \= trace(£ - A) n ~\ 

where 1 denotes the (column) vector all of where entries are 1. Similar formulas can be 
written for the number of strings (or joinable strings) with specified first and/ or last entry. 
The matrix B— A is a Perron-Frobenius matrix with lead eigenvalue (g— 1); this eigenvalue 
is simple, so both |<S„| and \J n \ grow at the precise exponential rate (g — 1), that is, there 
exist positive constants Cs = g / (g — 1) and C j such that 

\S n \~C s {g-l) n and \J n \~Cj{g-l) n . 

Every necklace of length n can be obtained by joining the ends of a joinable string, so 
there is a natural surjective mapping p n : J n — > F n . This mapping is nearly n to 1: In 
particular, no necklace has more than n pre-images, and the only necklaces that do not 
have exactly n pre-images are those which are periodic with some period d\ n smaller than 
n. The number of these exceptional necklaces is vanishingly small compared to the total 
number of necklaces. To see this, observe that the total number of strings of length n > 2 
is g(g — l) ra_1 ; hence, the number of joinable strings is between g(g — l) ra ~ 2 and g(g — l) n_1 . 
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The number of length-n strings with period < n is bounded above by 

J2a(9 ~ I)"" 1 < constant x (g - l)"/ 2 . 

d\n 

This is of smaller exponential order of magnitude than \J n \, so for large n most necklaces 
of length n will have exactly n pre-images under the projection p n . Consequently, as 

n — > oo 

\T n \~Cj{g-lT/n. 
More important, this implies the following. 

Lemma 4.1. Let \ n be the uniform probability distribution on the set J n , and let p, n o p^ 1 be the 
push-forward to J n of the uniform distribution on T n . Then 

(7) lim \\X n -n n op n -i\\Tv = 0. 

Here || • \\tv denotes the total variation norm on measures - see the Appendix. By 
Lemma [B.3l of the Appendix, it follows that the distributions of the random variable N(a) 
under the probability measures X n and fi n are asymptotically indistinguishable. 

4.2. The associated Markov measure. The matrix (B — A) has the convenient feature that 
its row sums and column sums are all g — 1. Therefore, the matrix P := (B — A)/{g — 1) is 
a stochastic matrix, with entries 

9 if b 7^ a -1 , and 
otherwise, where 

(9) e = (g-iy 1 . 

In fact, P is doubly stochastic, that is, both its rows and columns sum to 1. Moreover, 
P is aperiodic and irreducible, that is, for some k > 1 (in this case k = 2) the entries 
of P fe are strictly positive. It is an elementary result of probability theory that for any 
aperiodic, irreducible, doubly stochastic matrix P on a finite set Q there exists a shift- 
invariant probability measure v on sequence space S^, called a Markov measure, whose 
value on the cylinder set C{x\X2 ■ ■ ■ x n ) consisting of all sequences whose first n entries 
are x\X2 ■ • • x n is 

- n— 1 

(10) u(C(x 1 x 2 ■ ■ ■ X n )) = -j- p(xi,x i+ i) 

i=l 

Any random sequence X = (X1X2 • • • ) valued in Soo, defined on any probability space 
(Q, P), whose distribution is v is called a stationary Markov chain with transition probability 
matrix P. In particular, the coordinate process on (5oo, v) is a Markov chain with t.p.m. P. 

Lemma 4.2. Let X = (X1X2 . . .) be a stationary Markov chain with transition probability ma- 
trix P defined by ©. Then for any n > 1 the distribution of the random string X\X 2 ■ ■ ■ X n is the 
uniform distribution v n on the set S n . 



(8) p(a,b) 
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Proof. The transition probabilities p(a, b) take only two values, and 9, so for any n the 
nonzero cylinder probabilities (TT0|) are all the same. Hence, the distribution of X1X2 • • • X n 
is the uniform distribution on the set of all strings £ = x\x% ■ • ■ x n such that the cylinder 
probability z/(C(£)) is positive. These are precisely the strings of length n. ■ 



4.3. Mixing properties of the Markov chain. Because the transition probability matrix P 
defined by © is aperiodic and irreducible, the m— step transition probabilities (the entries 
of the mth power P m of P) approach the stationary (uniform) distribution exponentially 
fast. The one-step transition probabilities © are simple enough that precise bounds can 
be given: 

Lemma 4.3. The m—step transition probabilities p m (a, b) of the Markov chain with 1—step tran- 
sition probabilities © satisfy 



(11) 

where 9 = l/(g — 1). 



p m (a,b) - - 
9 



< 



Proof. Recall that P = 9{B — A) where B is the matrix with all entries 1 and A is an 
involutive permutation matrix. Hence, BA = AB = B and B 2 = gB = ((9 + 1)/9)B. This 
implies, by a routine induction argument, that for every integer m > 1, 

/ O—ra 1 1 \ 

(B - A) m = — ) B — A if m is odd, and 



9 

(g—m _ 1 \ 
J B + I if m is even. 

The inequality ((TTj) follows directly. ■ 

The next lemma is a reformulation of the exponential convergence ((TT]). Let X = 
(Xj)j £ z be a stationary Markov chain with transition probabilities ©. For any finite sub- 
set J C N, let Xj denote the restriction of X to the index set J, that is, 

Xj = (Xj)j E j; 

for example, if J is the interval [1, n] := {1,2,..., n} then Xj is just the random string 
X1X2 ■ ■ ■ X n . Denote by vj the distribution of X j, viewed as a probability measure on the 
set Q J ; thus, for any subset F C Q J , 

(12) uj(F) = P{Xj G F}. 

If J, K are non-overlapping subsets of N, then v j\jk arid vj x vk are both probability 
measures on Q JuK , both with support set equal to the set of all restrictions of infinite 
strings. 
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Lemma 4.4. Let J,K C N be two finite subsets such that max(J) + m < min(ET) for some 
m > 1. TTzen on the support of the measure uj x vk, 

(13) 1 - gd m < dujuK <l + g9 m 

dvj x vk 

where = l/(g — 1) and da/dj3 denotes the Radon-Nikodym derivative ("likelihood ratio") of the 
probability measure a and j3. 

Proof. It suffices to consider the special case where J and K are intervals, because the 
general case can be deduced by summing over excluded variables. Furthermore, because 
the Markov chain is stationary, the measures uj are invariant by translations (that is, 
vj+n = vj for any n > 1), so we may assume that J = [l,n] and K = [n + m, n + q]. 
Let xjuk be the restriction of some infinite string to J U K; then 



and 



(rt-l \ /m+q-1 \ 

Y[p(xj,x j+ i) \ ir(x n+m ) [ ] J p(xj,x j+ i) 
j=l J \j=n+m J 

(n-1 \ /m+q-1 

j=l y y=n+m 

The result now follows directly from the double inequality ([TT]) , as this implies that for 
any two letters a, b, 

Pm(a,b) 



7T(6) 



4.4. From random joinable strings to random strings. Since J n C S n , the uniform distri- 
bution A n on J n is gotten by restricting the uniform distribution v n on S n to J n and then 
renormalizing: 

Xn[F) ~ Vn{Jn) ■ 

Equivalently, the distribution of a random joinable string is the conditional distribution 
of a random string given that its first and last entries are not inverses. Our goal here 
is to show that the distributions of the random variable N(a) defined by 10 under the 
probability measures A n and v n differ negligibly when n is large. For this we will show 
first that the distributions under A n and u n , respectively, of the substring gotten by deleting 
the last n l / 2 ~ £ letters are close in total variation distance; then we will show that changing 
the last ra 1 / 2-6 letters has only a small effect on the value of N(a). 

Lemma 4.5. Let X1X2 ■ ■ ■ X n be a random string of length n, and Y1Y2 ■ ■ ■ Y n a random joinable 
string. For any integer m £ [l,n — 1] let v n ^ m and A„ jm denote the distributions of the random 
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substrings X\X 2 ■ ■ ■ X n ^ m and Y\Y 2 ■ ■ ■ Y n -m- Then the measure A n _ m is absolutely continuous 
with respect to v n<m , and the Radon-Nikodym derivative satisfies 

V / 1 1 „Om. — J,. — 



1 + g6 m ~ du n>m ~ 1 - g6 m 
where 6 = l/{g — l). Consequently, the total variation distance between the two measures satisfies 

(15) \Wn,m — ^n,m\\TV < 2 ' 

Proof. The cases m = and m = 1 are trivial, because in these cases the lower bound 
is non-positive and the upper bound is at least 2. The general case m > 2 follows from 
the exponential ergodicity estimates ((TTj) by an argument much like that used to prove 
Lemma l4~4l For any string x\x 2 • • • x n ^ m with initial letter x\ = a, 



Similarly, by Lemma l4~2l 

*n,m{XlX 2 ■ ■ ■ X n - m ) = - \ p{Xi,X i+1 )\ 1 ? — . 

5 \ f = \ J 9 l l2a22b^a-iPn{a,b) 

Inequality ((TT]) implies that the last fraction in this expression is between the bounding 
fractions in (fH)) . The bound on the total variation distance between the two measures 
follows routinely. ■ 

Corollary 4.6. Let Xbea stationary Markov chain with transition probability matrix P. Assume 
that the functions hk satisfy hypotheses (H0)-(H3) ofsection\3\ Then for all k, i > 1, 

(16) E\h k (X,T l X)\ <Cp k 

Proof. The function h k (xL, r l x) is a function only of the coordinates x\x 2 ■ ■ ■ Xi +k , and so for 
any joinable string X of length > i + k, 

h k (x,T l x) = h k (x,a l x). 

By Lemma 14.51 the difference in total variation norm between the distributions of the 
substring x\x 2 ■ ■ ■ Xi +k under the measures A n and v n converges to as n — > oo. Therefore, 

E\h k (XyX)\ = lim Exjh^^a)] < C? k . 



Now we are in a position to compare the distribution of the random variable N(a) 
under fj, n with the distribution of a corresponding random variable on the sequence 
space under the measure v. (Recall that the distribution of a random variable Z defined 
on a probability space (O, J 7 , P) is the pushforward measure P o Z^ 1 . See the Appendix 



SELF-INTERSECTIONS IN COMBINATORIAL TOPOLOGY: STATISTICAL STRUCTURE 



13 



for a resume of common terminology from the theory of probability and basic results 
concerning convergence in distribution.) The function is defined by 

n n oo 

(17) iVf(x) E EMt*x,^x). 

i=l j"=i+l fc=l 

Proposition 4.7. Assume that the functions satisfy hypotheses (H0)-(H3), and let k = YlkLi EHk 
Let F n be the distribution of the random variable (N(a) — n 2 K,)/n 3 ^ 2 under the uniform proba- 
bility measure p n on T n , and G n the distribution of (iVjf (x) — Kn 2 )/n 3 / 2 under v. Then for any 
metric g that induces the topology of weak convergence on probability measures, 

(18) lim g(F n ,G n ) = 0. 

n— >oo 

Consequently, F n =>■ $ CT if and only ifG n => <Jv 



Proof. Let F' n be the distribution of the random variable (N(a) — n 2 K)/n 3 / 2 under the uni- 
form probability measure A n on J n . By Lemma l4~Tl the total variation distance between A n 
and [i n o Pn 1 is vanishingly small for large n. Hence, by Lemma TB.3l and the fact that total 
variable distance is never increased by mapping (cf. inequality J47l) of the Appendix), 

lim g{F n ,F n ) = 0. 

n— »oo 

Therefore, it suffices to prove fLSt with F n replaced by F' n . 

Partition the sums 0) and ((17)) as follows. Fix < 5 < 1/2 and set m = m(n) = [n 6 ]. By 
hypothesis (H3) and Corollary (|4.6|) , 

n n 

^E E E \hk{r l ^r^)\<Cn 2 r {n) and 

i=l j=i+l k>m(n) 
n n 

^„EE E i^(^a,^a)i <cn 2 r {n) . 

i=l i=i+l k>m(n) 

These upper bounds are rapidly decreasing in n. Hence, by Markov's inequality (i.e., 
the crude bound > e} < E\Y\/e), the distributions of both of the sums converge 

weakly to as n — > oo. Thus, by Lemma TB.3[ to prove the proposition it suffices to prove 
that 

lim g(F n A ,G^) = 

n— >oo 

where F^ and G^ are the distributions of the truncated sums obtained by replacing the 
inner sums in 0) and (17} by the sums over 1 < k < m(n). 

The outer sums in (0 and (fl7)> are over pairs of indices 1 < i < j < n. Consider those 
pairs for which j > n — 2m(n): there are only 2nm(n) of these. Since nm(n) = 0(n 1+s ) 
and 5 < 1/2, and since each term in and ^Y7\ is bounded in absolute value by a constant 
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C (by Hypothesis (HI)), the sum over those index pairs i < j with n — 2m{n) < j < n is 
o(n 3 / 2 ). Hence, by Lemma [E31 it suffices to prove that 

lira g(Ff,G%) = 

n— >oo 

where F r f and G^ are the distributions under X n and v of the sums © and (TP7l ) with the 
limits of summation changed to i < j < n—2m{n) and k < m(n). Now if i < j < n—2m(n) 
and k < m(n) then /ifc(r*x, r J x) and hk(a l a,a^a) depend only on the first n — n{m) entries 
of x and a. Consequently, the distributions and are the distributions of the sums 

n—2m(n) n—2m(n) 

E E E ^ x y X ) 

i=l J=i+1 fc<m(n) 

under the probability measures A n m and i^n,mr 

respectively, where A niTO and i/ n)m are as 
defined in Lemma 14 5i But the total variation distance between A n>m and v n ,m converges 
to zero, by Lemma |431 Therefore, by the mapping principle J47l ) and Lemma TB.31 

g(F*,G*)^0. 



4.5. Mean Estimates. In this section we show that the hypothesis (H3) is satisfied by the 
functions = + Vk, where Uf. and are as in Proposition ^. II 

Lemma 4.8. Let a 1 be the ith cyclic shift on the set J n of joinable sequences a. There exists 
C < oo such that for all 2 < k < n and < i < j < n, 

E Xn u k (a^,a^) < C6 k / 2 and 
K ' E Xn v k (a^,a^) < Ce k ' 2 . 

Proof. Because the measure A n is invariant under both cyclic shifts and the reversal func- 
tion, it suffices to prove the estimates only for the case where one of the indices i, j is 
0. If the proper choice is made (i = and j < n/2), then a necessary condition for 
Uk{a, a^a) ^ is that the strings a and a^a agree in their second through their (k — l)/2th 
slots. By routine counting arguments (as in section l4~T]) it can be shown that the number of 
joinable strings of length n with this property is bounded above by C(g — i) n_fc / 2 / where 
C < oo is a constant independent of both n and k < n. This proves the first inequality. A 
similar argument proves the second. ■ 

5. U-Statistics of Markov chains 

Proposition 14 . 71 implies that for large n the distribution F n considered in Theorem |3.1| is 
close in the weak topology to the distribution G n of the random variable 2V„ defined by 
(Tl7t under the Markov measure v. Consequently, if it can be shown that G n =4> § a then 
the conclusion F n => Q a will follow, by Lemma IB.3I of the Appendix. This will prove 
Theorem 13 .11 
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Random variables of the form ((17)) are known generically in probability theory as U- 
statistics (see [10 J). Second order [/—statistics of Markov chains are defined as follows. Let 
Z = Z\ Z2 ■ ■ ■ be a stationary aperiodic, irreducible Markov chain on a finite state space 
A with transition probability matrix Q and stationary distribution n. Let r be the forward 
shift on the sequence space „4 N . The [/—statistics of order 2 with kernel h : A N x „4 N — > R 
are the random variables 



n 



i=i j=i+i 

The Hoeffding projection of a kernel h is the function H : A^ — > R defined by 

H(z) = Eh(z,Z). 

Theorem 5.1. Suppose that h = YlT=ihk where {hk}k>i is a sequence of kernels satisfying 
hypotheses (H0)-(H2) and the following: There exist constants C < 00 and < (3 < 1 such that 
for all k, i > 1, 

(20) E\h k (Z,r l Z)\ <C/3 k . 

Then as n — >• 00, 

W n -n 2 K 

(2D ^37^ =^ *° 

where the constants k and a 2 are 

(22) K = S^EH k (Z) and a 2 = lim -E (V V Hd^Z) - uk) . 

— ' n— s-oo n \ — ' ^ — ' / 

k=l \i=l k=l / 

There are similar theorems in the literature, but all require some degree of additional 
continuity of the kernel h. In the special case where all but finitely many of the functions 
hk are identically the result is a special case of Theorem 1 of or Theorem 2 of (TT| . 
If the functions hk satisfy the stronger hypothesis that < Cj3 k pointwise then the 
result follows (with some work) from Theorem 2 of [11 J. Unfortunately, the special case 
of interest to us, where = Uk + and Uk, Vk are the functions defined in sec.|2l does not 
satisfy this hypothesis. 

The rest of section |5] is devoted to the proof. The main step is to reduce the problem 
to the special case where all but finitely many of the functions hk are identically by 
approximation; this is where the hypothesis (|20)) will be used. The special case, as already 
noted, can be deduced from the results of [9 J or [11 J, but instead we shall give a short and 
elementary argument. 

In proving Theorem 15.11 we can assume that all of the Hoeffding projections Hk have 
mean 

EH k (Z) = 0, 

because subtracting a constant from both h and k has no effect on the validity of the 
theorem. Note that this does not imply that Ehk{r % Z,T^Z) = 0, but it does imply (by 
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Fubini's theorem) that if Z and Z' are independent copies of the Markov chain then 

Eh k {Z,Z') = 0. 

5.1. Proof in the special case. If all but finitely many of the functions are then for 
some finite value of K the kernel h depends only on the first K entries of its arguments. 

Lemma 5.2. Without loss of generality, we can assume that K = 1. 

Proof. If Z1Z2 • • • is a stationary Markov chain, then so is the sequence Z^Z^ • • • where 

is the length-(-R r + 1) word obtained by concatenating the K + 1 states of the original 
Markov chain following Z%. Hence, the [/—statistics W n can be represented as [/—statistics 
on a different Markov chain with kernel depending only on the first entries of its argu- 
ments. It is routine to check that the constants k and a 2 defined by fl22l for the chain Z^ 
equal those defined by ((22|) for the original chain. ■ 

Assume now that h depends only on the first entries of its arguments. Then the Hoeffd- 
ing projection H also depends only on the first entry of its argument, and can be written 

as 

H(z) = Eh(z, Z 1 )=Y^ H z > z')tt(z'). 
z'eA 

Since the Markov chain Z n is stationary and ergodic, the covariances EH(Zi)H(Zi +n ) = 
EH{Zi)H(Z\ +n ) decay exponentially in n, so the limit 

(23) a* := l^J-E (±H( Zj) ) 

exists and is nonnegative. It is an elementary fact that a 2 > unless H = 0. Say that the 
kernel h is centered if this is the case. If h is not centered then the adjusted kernel 

(24) h*(z, z ) := h(z, z') - H(z) - H(z') 
is centered, because its Hoeffding projection satisfies 

H*(z) : = Eh*{z,Z x ) 

= Eh(z, Zx) - EH(z) - EH(Z l ) 
= H(z) - H(z) - 0. 

Define 

n n n 

T n = ^2Y^ h ( Z ^ Z j) and D n = Y j h{Z i ,Z i ); 

8=1 j = l 1=1 

then since the kernel h is symmetric, 

(25) W n = \{T n - D n ). 
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Proposition 5.3. If h is centered, then 

(26) T n /n => Q 

where Q is a quadratic form in no more than m = \A\ independent, standard normal random 
variables. 

Proof. Consider the linear operator on £ 2 (A, it) defined by 

L h f(z) := ^2h(z,z')f(z')7r(z'). 

z'eA 

This operator is symmetric (real Hermitean), and consequently has a complete set of or- 
thonormal real eigenvectors <fj(z) with real eigenvalues Xj. Since h is centered, the con- 
stant function ip\ := 1/y/m is an eigenvector with eigenvalue Ai = 0; therefore, all of the 
other eigenvectors ipj, being orthogonal to ipi, must have mean zero. Hence, since Ai = 0, 

m 

h(z,z) = ^2x j ip j (z)<f j (z), 
i=2 



and so 



(27) T n = J2*kJ2Y,w( Z ^ 

k=2 i=l 3=1 
m / n \ 2 



k=2 \i=l 



Since each tp^ has mean zero and variance 1 relative to ir, the central limit theorem for 
Markov chains implies that as n — > oo, 

1 n 

(28) -= V <p k {Zi) => Normal(0, a 2 k ), 

with limiting variances o\ > 0. In fact, these normalized sums converge jointl$\ (for 
k = 2, 3, . . . , m) to a multivariate normal distribution with marginal variances ai > 0. 
The result therefore follows from the spectral representation (|27)) . ■ 

Corollary 5.4. if /i is not centered, then with a 2 > as defined in 

(29) W n /n 3/2 => Normal(0, a 2 ). 



^Note, however, that the normalized sums in ( f28t need not be asymptotically independent for different k, 
despite the fact that the different functions ipk are uncorrelated relative to w. This is because the arguments Z t 
are serially correlated: in particular, even though ipk{Zi) and ipi(Zi) are uncorrelated, the random variables 
ipk(Zi) and ipi {Z i+1 ) might well be correlated. 
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Proof. Recall that W n = (T n - D n )/2. By the ergodic theorem, Hindoo D n /n = Eh{Z 1 ,Z 1 ) 
almost surely, so D n /n 3 / 2 => 0. Hence, by Lemma |B"73"1 of the Appendix, it suffices to prove 
that if h is not centered then 

(30) T n /n 3 / 2 => Normal(0, 4cr 2 ) 

Define the centered kernel h* as in < [24|) . Since the Hoeffding projection of H* is identi- 
cally 0, 

n 

T n = T* + 2n^ H{Z t ) where 

n n 

i=i j=i 

Proposition 15.31 implies that T*/n converges in distribution, and it follows that T* / n 3 / 2 
converges to in distribution. On the other hand, the central limit theorem for Markov 
chains implies that 

n~ 3/2 (^nY^HiZi)^ =^Normal(0,4cr 2 ), 

with a 2 > 0, since by hypothesis the kernel h is not centered. The weak convergence ((30|) 
now follows by Lemma lB3l ■ 



5.2. Variance/covariance bounds. To prove Theorem l5.1l in the general case we will show 
that truncation of the kernel h, that is, replacing h = Y^"k=i hk by h K = Ylk=i hk, has only 
a small effect on the distributions of the normalized random variables W n /n 3 / 2 when K 
is large. For this we will use second moment bounds. To deduce these from the first- 
moment hypothesis (j20"|) we shall appeal to the fact that any aperiodic, irreducible, finite- 
state Markov chain is exponentially mixing. Exponential mixing is expressed in the same 
manner as for the Markov chain considered in section [4731 For any finite subset J C N, 
let Zj = (Zj)j e j denote the restriction of Z to the index set J, and denote by fij the 
distribution of Zj. If /, J are nonoverlapping subsets of N then both /j,j u j and m x 
are probability measures supported by A IUJ . If the distance between the sets / and J is 
at least m*, where is the smallest integer such that all entries of Q m * are positive, then 
///uj and /if x fij are mutually absolutely continuous. 

Lemma 5.5. There exist constants C < oo and < 5 < 1 such that for any two subsets I, J C N 
satisfying min( J) — max(J) = m > m*, 



1 - C5 m < 



IUJ 



<l + C5 ri 
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The constant C need not be the same as the constant in the hypothesis (|20|) ; however, 
the exact values of these constants are irrelevant to our purposes, and so we shall mini- 
mize notational clutter by using the letter C generically for such constants. The proof of 
the lemma is nearly identical to that of Lemma 14.41 except that the exponential conver- 
gence bounds of Lemma 1431 must be replaced by corresponding bounds for the transition 
probabilities of Z. The corresponding bounds are gotten from the Perron-Frobenius theo- 
rem. 

For any two random variables U, V denote by cov(U, V) = E{UV) — EUEV their co- 
variance. (When U = V the covariance cov(U, V) = Var (£/").) 

Lemma 5.6. For any two pairs i < j and i' < f of indices, let A = A(i, i r ,j,f) be the distance 
between the sets {i,j} and {i',f} (that is, the minimum distance between one ofi,j and one of 
i',f). Then for suitable constants < C, C < oo, for all A > max(ft, ft') + m„ 

(31) |cov(/ lfc (r i Z,r-'Z),/ lfc ,(T i 'Z,^"z))| < C7'/3 fc+fc '- 4 £ A _ max( ,, w) 

where 

g m = (l + C5 m f for m>m*. 
and (3 is the exponential decay rate in equation (|20|) . 

Remark 5.7. What is important is that the covariances decay exponentially in both k + k' 
and A; the rates will not matter. When A < max(A:, k') + m* the bounds (131) do not apply. 
However, in this case, since the functions are bounded above in absolute value by a 
constant C < oo independent of k (hypothesis (HI)), the Cauchy-Schwartz inequality 
implies 

|cov(/i fc (T i Z,r»'Z),7» v (r*'Z,r»"Z))| 2 = (Eh k (T% T j Z)h k/ (r 1 ' Z,r j ' Z))) 2 

< (Eh k (r% riZ)Eh k ,(T l 'z, r^'z))) 2 

< C 2 Eh k (T% T j Z)Eh k ,{T l 'Z, T j 'Z)) 

the last by the first moment hypothesis (jSP) . 

Proof of Lemma 15^61 Since the random variables h k are functions only of the first k letters 
of their arguments, the covariances can be calculated by averaging against the measures 
HjuKr where 

J =[i,i + k]U [j, j + k] and K =[i',i' + k'} U [j', j + ft'] . 

The simplest case is where j : + k < i'; in this case the result of Lemma 1531 applies directly, 
because the sets J and K are separated by m = A — ft. Since the functions h k are uniformly 
bounded, Lemma 1531 implies 

,_ r ,m < Eh^Z^h^Z^'Z) 

- Eh k (T i Z,HZ)Eh kl {T i 'Z,rj'Z) ~ + 
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The inequalities in ((31]) now follow, by the assumption ((20|) . (In this special case the 
bounds obtained are tighter that those in ((31).) 

The other cases are similar, but the exponential ergodicity estimate ((13]) must be used 
indirectly, since the index sets J and K need not be ordered as required by Lemma [ 
Consider, for definiteness, the case where 

i + k<i'<i' + k'<j<j + k<j'<j' + k' . 

To bound the relevant likelihood ratio in this case, use the factorization 

dflJuK dfljuK 

dfij X fi K dfij- UK - X flj+ uK + 

d^J-uK- x f l J+UK+ 



dfij- x n K - x fi J+ x n K + 

d ^J- x V-K- x Hj+ x VK+ 

dfij x fi K 

where J- = [i,i + k], J+ = [j,j + k], K~ = + k'], and K + = [j',f + k']. For 
the second and third factors, use the fact that Radon-Nikodym derivatives of product 
measures factor, e.g., 

dfij- x n K - x hj+ x fi K + dfij- x fi J+ dn K - x fi K + 

-, (xj,x K ) = (xj) X (X K ) 

dfij x fj, K d\ij dfi K 

Now Lemma (5.51 can be used to bound each of the resulting five factors. This yields the 
following inequalities: 

(1 _ cS m f < dflJuK < (1 + C8 m )\ 
dfij x fi K 

and so by the same reasoning as used earlier, 

n < Eh k (r^Z)h kl {r'zyZ) 5 

{l ~ C5 ] ~ Eh k (T>Z,TiZ)Eh k ,(T>'Z,TfZ)- {1 + C6 } • 

The remaining cases can be handled in the same manner. ■ 
Corollary 5.8. There exist C, C < oo such that for alln> 1 and all 1 < K < L < oo, 

(n n L \ oo oo 

E E E^( r ' x ' riz ) <^ 3 EE {(k'+k+c)/3 k+k '}. 
i=l j=i+l k=K ) k=K k'=K 

Consequently, for any e > there exists K < oo such that for all n > 1, 

(n n oo \ 

EE E h k {T l Z,T j Z)\ <en 3 . 
i=l j=i+l fc=A'+l / 
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Proof. The variance is gotten by summing the covariances of all possible pairs of terms in 
the sum. Group these by size, according to the value of A(i, i',j, f): for any given value 
of A > 2, the number of quadruples in the range [l,n] with A(i,i',j,j') = A is 

no greater than 24n 3 . For each such quadruple and any pair k, k! such that K < k < k! 
Lemma 1531 implies that if A > k + m* then 

|cov(^(r i Z,^Z),%(r i 'Z,^'Z))| < Cf3 k+k ' QA ^> 
If A < m* + k' then the crude Cauchy-Schwartz bounds of Remar k |5 . 71 imp ly that 

|cov(/ ifc (T i Z,r^Z),/i fc /(T i 'z,r^"z))| < C/3 k+k ' 

where C < oo is a constant independent of i, i' ,j, f ,k,k'. Summing these bounds we find 
that the variance on the left side of ((32]) is bounded by 

L L oo 

Cn 3 J2 P k+k '( m * + k + k> + S ^ 

k=K k=K A=m» 

Since Qj is exponentially decaying in j, the inner sum is finite. This proves the inequality 
The second assertion now follows. ■ 



5.3. Proof of Theorem l5.1[ Given Corollary 15.81 — in particular, the assertion ((33j> — The- 
orem 15.11 follows from the special case where all but finitely many of the functions hk 
are identically zero, by Lemma IB.4I of the Appendix. To see this, observe that under the 
hypotheses of Theorem l5.1[ the random variable W n can be partitioned as 

W n = W? + R% 

where 

n n K n n oo 

< = EEEMAr j Z) and R* = £ £ £ h k {r%r^). 

i=l j=i+l k=l i=l j=i+l k=K+l 

By Proposition |5.3| and Corollary 15.41 for any finite K the sequence jr?l 2 converges to 
a normal distribution with mean and finite (but possibly zero) variance a\. By j33l , for 
any e > there exists K < oo such that E\R^\ 2 /n 3 < e for all n > 1. Consequently, by 
Lemma fB.41 a 2 = lim^^oo a 2 K exists and is finite, and 

W n /n 3/2 =» Normal(0,tJ 2 ). 



6. Mean /Variance Calculations 

In this section we verify that in the special case hk = Uk + Vk, where Uk and Vk are the 
functions defined in section|2]and h\ = 0, the constants k and a 2 defined by l|22]l coincide 
with the values 

Assume throughout this section that X = X1X2 . . . and X' = X[X' 2 ■ ■ ■ are two in- 
dependent stationary Markov chains with transition probabilities ©, both defined on a 
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probability space (Q, P) with corresponding expectation operator E. Set h k = u k + v k . For 
each fixed (nonrandom) string x\X2 • • • of length > k define 

(34) H k = U k + V k where 

(35) U k {x\X2 ■ ■ ■ ) = Eu k (x\X2 ■ ■ ■ x k , X[Xz ■ ■ ■ ) and 
V k (xix 2 ■■■) = Ev k {xix 2 ■ ■ ■ x k ,X[X' 2 ■■■), 



and set 

(36) S k (xix 2 • • • ) = U 2 (xxx 2 ) + ^{Ui + Vi)(xxx 2 ■■■xi) 



k 



1=3 



Since the summands are all nonnegative and satisfy hypotheses (HO)- (H3), the last sum 
is well-defined and finite even for k = 00. The restrictions of U k and V k to the space of 
infinite sequences are the Hoeffding projections of the functions u k and v k (see section [3]). 
Note that each of the functions U k , V k ,H k depends only on the first k letters of the string 
X1X2 • • • . By equations l(22]l of Theorem l5.il the limit constants k and a 2 are 

00 - ( n \ 2 

k = V EH k (X) and a 2 = lim -E I V V Hd^X) - uk) . 

— ' n— >oo n \ * — ' ^ — ' / 

k=2 \i=l k=2 / 

We will prove (Corollary |6.4| ) that in the particular case of interest here, where h k = u k +v k , 
the random variables H k (T l ~K) and -fffc'(r* X) are uncorrelated unless i = i' and k = k' . It 
then follows that the terms of the sequence defining a 2 are all equal, and so 

00 

a 2 = ^Var(tf fe (X)). 

k=2 

Lemma 6.1. For each string x\X2 ■ ■ ■ x k of length k > 2 and each index i < k — 1, define 
ji = ji{x\X2 ■ ■ ■ x k ) to be the number of letters between Xi and Xj+i in the reference word O in the 
clockwise direction (see Figure^. Then 



(37) V 2 = and U k = V k for all k>3, 

and 

08) u k (x lX2 ---)= t( ; juik - k \ 

g{g - 1)* 1 

where t(a, b) = a(g — 2 — b) + b(g — 2 — a). 
Therefore, 

(39) = _ 5 _ J + 2 g ;5 _ FT . 
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, . Xi 



i o ji(xiX 2 ■ ■ ■ XiX i+ l ...) 



FIGURE 3. The interval of length jj in O 



Proof. The Markov chain with transition probabilities © is reversible (the transition prob- 
ability matrix ([8]) is symmetric), and the transition probabilities are unchanged by inver- 
sion cl i — y (I and a 1 \-t a'. Hence, the random strings X[X 2 - ■ ■ X' k and X' k X' k _ l ■ ■ ■ X[ have 
the same distribution. It follows that for each k > 2, 

U k (xix 2 ■■■) = Vk(xix 2 ■■■)■ 

Consider the case k = 2. In order that u 2 (x\x 2 , X^X^) ^ it is necessary and sufficient 
that the letters xiX[x 2 X 2 occur in cyclic order (either clockwise or counterclockwise) in 
the reference word O. For clockwise cyclic ordering, the letter X[ must be one of the j% 
letters between x± and x 2 , and X' 2 must be one of the g — 2 — j% letters between x 2 and 
x\. Similarly, for counterclockwise cyclic ordering, X[ must be one of the g — 2 — j\ letters 
between x 2 and x\, and X 2 one of the j\ letters between x\ and x 2 . But X[, and hence 
also its inverse X[, is uniformly distributed on the g letters, and given the value of X[ the 
random variable X' 2 is uniformly distributed on the remaining (g — 1) letters. Therefore, 

U 2 [xix 2 ) - 



gig - 1) 



The case k > 3 is similar. In order that Uk(x\x 2 ■ ■ ■ , X[X 2 ■ ■ ■ ) be nonzero it is neces- 
sary and sufficient that the strings x\x 2 ■ ■ ■ x k and X[X 2 ■ ■ ■ X' k differ precisely in the first 
and fcth entries, and that the letters x\X' x x 2 occur in the same cyclic order as the letters 
XkX' k Xk-i- This order will be clockwise if and only if X[ is one of the j\ letters between x\ 
and x 2 and X' k is one of the g — 2 — j^-x letters between x k and xt-i- The order will be 
counterclockwise if and only if X[ is one of the g — 2 — j\ letters between x 2 and x 2 and X' k is 
one of the jk-\ letters between x^-i and x k - Observe that all of these possible choices will 
lead to reduced words X[x 2 x^ ■ ■ ■ Xk-\X' k . By ©, the probability of one of these events 
occurring is 

U k {XlX 2 ■■■X k ) = —. 7TT—T- 

5(5 -ir 



For i = 1,2,..., define J, = ji(X\X 2 ■ ■ ■ ) to be the random variable obtained by eval- 
uating the function ji at a random string generated by the Markov chain, that is, Jj is the 
number of letters between Xi and Xi + \ in the reference word O in the clockwise direc- 
tion. Because is obtained by randomly choosing one of the letters of Q other than 
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Xi, the random variable J; is independent of Xi. Since these random choices are all made 
independently, the following is true: 

Lemma 6.2. The random variables X 1} J l5 J 2 , . . . are mutually independent, and each Ji has the 
uniform distribution on the set {0, 1, 2, . . . , g — 2}. Consequently, 



(40) EJ t = (g- 2)/2, 

Ejf = (g-2)(2g-3)/6, 
Ejf = (g-2) 2 (g-l)/4, 
Ejf = (g - 2) (2g - 3) (3g 2 - 9g + 5) /30 flnd 
SJi J v = EJ t EJ v = (g- 2) 2 /4 for i ^ i'. 



By Lemma I6TT1 the conditional expectations Uk, Vk are quadratic functions of the cycle 
gaps Ji , J2 , Consequently, the unconditional expectations 



£u fc (X,X') = EU k {X) 

can be deduced from the elementary formulas of Lemma 16^21 by linearity of expectation. 
Consider first the case k > 3: 



g(g - l) k - l EU k (X) = Et{J t , J k _ ± ) 

= 2EJ l {g-2- J fc _ x ) 
= 2( 5 - 2)£J 1 - 2EJ\J k _x 
= (g-2f-{g-2f/2 
= (g-2) 2 /2. 



For fe = 2: 



5(5 - 1)EU 2 (X.) = Et(J u J X ) 

= 2EJx(g-2- Jx) 

= 2(g - 2)EJx - 2EJ\ 

= (g-2) 2 -(g-2)(2g-3)/3 

= (g-2)(g-3)/3. 
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Corollary 6.3. If X = X1X2 ■ ■ ■ and X' = X[X' 2 ■ ■ ■ are independent realizations of the 
stationary Markov chain with transition probabilities ((H), then 

(41) Eu 2 (X,X') 

(42) Eu k (X,X') 

(43) £Soo(X) 

The variances and covariances of the random variables U k (X.) can be calculated in sim- 
ilar fashion, using the independence of the cycle gaps J k and the moment formulas in 
Lemma I6T21 It is easier to work with the scaled variables t(J\, J k ) = g(g — l) k U k+ i rather 
than with the variables U k , and for convenience we will write jf 1 = g — 2 — Jj. Note 
that by definition and Lemma 16.21 the random variables Jj and J/ 2 both have the same 
distribution (uniform on the set {0, 1, . . . , g — 2}), and therefore also the same moments. 

Case 0: If i, j, k, m are distinct, or if i = j and i, k, m are distinct, then 

Et(Ji, Jj)t(Jk, J m ) = Et(Ji, Jj)Et(Jk, J m ), 

since the random variables Jj, Jj, J k , J m (or in the second case Ji,J k , J m ) are independent. 
It follows that for any indices i,j,k,m such that and i + k / j + m, the random variables 
U k (r l X.) and C/ m (r J X) are uncorrelated. (Here, as usual, r is the forward shift operator.) 

Case 1: lfi,k,m > 1 are distinct then 

Et(Ji, Jk)t(Ji, J m ) = E^iJj} + JkJ^iJiJ^ + J m Ji) 

— EJ%J k JiJ^ ~t~ EJ{J^ J m -\- EJ^ JfcJiJ m -\- EJ^ J k J^ Jm 

= ({g - 2) 2 /A)(Ejf + EJiJ? + EJ?J t + Ejfjf) 
= ((g - 2f/A)E{J i + J«) 2 

= (9- 2) 4 /4 

= Et{Ji, Jk)Et{Ji, J m ). 

Thus, the random variables t(Ji,J k ) and t(Ji,J m ) are uncorrelated. Consequently, for all 
choices of i, j, k > 1 such that j / k, the random variables f/j(r*X) and U m (r l lC) are 
uncorrelated. 



EU 2 (X) 
EU k (X) 



(g-2)(g-3) 
Ma ~ 1) 



2g(g 



— 1) ~~ 



and 



Eu 2 (X, X') + E(u k + v k )(X, X') = (g _ 2 | ) . 

fc=3 ^ ' 
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Case 2: If i ^ k then 

Et(Ji, Ji)t(Ji, Jk) = EJiJ^JiJ^ + EJijf jf Jk + EjfjiJiJ^ + Ejf JiJf Jk 
= {{g - 2)/2)(2EJ l J i J i R + 2EJ { J R J R ) 
= 2(g - 2)(EJ l J l (g - 2 - J,)) 
= 2{g-2){{g-2)EJ 2 -Ejf) 
= ( 5 -2) 3 ( 5 -3)/6 
= Et(Ji, Jk)Et(Ji, Ji) 

Once again, the two random variables are uncorrected. It follows that for all i > 1 and 
m > 3 the random variables ^(r'X) and C/ m (r*X) are uncorrelated. 



Case 3: If k > 2 then 



Et(J\, Jk) — EJ\J\J k J k + EJ X J x JkJk + 2EJ l J\J k Jk 
= 2{EJ 2 ) 2 + 2(EJ 1 J?) 2 

= (g- 2) 2 (2g - 3) 2 /18 + (g - 2) 2 (g - 3) 2 /18 

and so 

var(*(Ji, J k )) = Et(J u J k ) 2 - (Et(J u J k )f 

= (g- 2) 2 (2g - 3) 2 /18 + (g - 2) 2 (g - 3) 2 /18 - (g - 2) 4 /4 
= g 2 {g-2) 2 /m. 



Case 4: When k = 1: 

Et(J\, J\) 2 = AEJi Ji Jf Jf 

= 4((s - 2) 2 EJ 2 - 2(g - 2)Ejf + Ejf) 
= 2(< 7 -2)(< 7 -3)(<7 2 -4 ff + 5)/15, 

so 

var(t(Ji, Ji)) = £t(Ji, Ji) 2 - (Et(J 1 ,J 1 )) 2 

= 2(g - 2)(g - 3)(g 2 - 4g + 5)/15 - ( 5 - 2) 2 ( 5 - 3) 2 /9 
= 5 ( 5 -2)( 5 -3)( 5 + l)/45, 



This proves: 
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Corollary 6.4. The random variables C4(t*X), where i > and k > 2, are uncorrected, and 
have variances 

(9 ~ 2) 2 



(44) 

Consequently, 
(45) 



Var([/ fc (r l X)) 
Var(C/ 2 (r i X)) 



36( 5 - l) 2fc - 2 
(g-2)(g-3)(g + l 
45 5 ( 5 - l) 2 



for k>3, 



Var(5 0O (r 4 X)) = Var(C7 2 (X)) + >T Var(2I7 fc (X)) 



fc=3 



A' 



Var(C/ 2 (X))+ lim V Var(2C4(X)) 
— * — • 



fc=3 

(5-2)( 5 -3)(g + l) + 9-2 



45<?( 5 - l) 2 9 9 ( 5 - l) 2 

(ff-2)(g 2 -2g + 2) 

45g(g - l) 2 
2 X (2 X 2 - 2 X + 1) 



45(2 X -l)2(x-l) 



Appendix A. An example of the combinatorics of self-intersection counts 

The counting of self-intersection numbers is based on the following idea: Two strands 
on a surface come close, stay together for some time and then separate. If one strand 
enters the strip from "above" and exits "below" and the other vice versa there must be 
an intersection. This intersection is measured by the functions and where k gives 
the "length of the time" that the strands stay together. (See Figure [TJ showing pairs of 
subwords for which u 2 7^ and U3 7^ 0). 

Example A.l. Let O denote the cyclic word of FigurelU Consider a 16-gon with alternate 
sides labeled with the letters of O as in Figure By "glueing" the sides of this poly- 
gon labeled by the same letter, one obtains a surface £ of genus two and one boundary 
component. 



a 

d b 



d b 

c 



FIGURE 4. An example of a word O 
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Let a be a necklace which can be unhooked to a* = abcaacba. There is a one to one 
correspondance between the self-intersection points of a representative of a with minimal 
self-intersection and the pairs of subwords of a listed in (a), (b) and (c). 

(a) (bc,ca), (be, ac), (ca, cb) and (ac, cb) (These are the all the pairs of the form (c\C2, d^) 
such that if w and w' are words with finite or infinite letters, w = c\C2 ■ ■ ■ and w' = 
d\&i . . . then u^w, w') = 1 and U2(w',w) = 1). 

(b) (caa, aac) and (baa, aab). (These are all the pairs (c\C2 . . . Ck,d\d2 ■ ■ ■ dk) of subwords of 
a, with k > 3 such that if w = c\C2 . . . . . . and w' = d\d2 ■ ■ ■ df* . . . then u^(w, w') = 1 

and Uk(w', w) = 1.) 

(c) (abca, acba) (This is the only pair of subwords (c\C2 ■ ■ ■ c& . . . , d\d2 ■ ■ ■ dk ■ ■ ■ ) of a of 
more than two letters such that if w = c\C2 ■ ■ ■ . . . and w' = d\d2 ■ ■ ■ dk ■ ■ ■ Vk (w, w') = 
1 and Vk(w', w) = 1.) 

Since there are seven pairs listed in (a), (b) and (c), the self-intersection number of a 
equals to seven. 




Clearly the arcs corresponding to the subwords of a, be and ca intersect in the polygon 
(see Figure |5£l). This suggests that the occurrence of be and ca as subwords of a cyclic 
word will imply a self-intersection point in every representative of the eyelid word. 

Now, consider the pair of subwords of a, aa and ca, (see Figure Efll)). Since both of the 
corresponding arcs land in the edge a of the polygon, the occurrence of these two sub- 
words does not provide enough information to deduce the existence of a self-intersection 
point. In order to understand better this configuration of segments, we prolong the sub- 
words starting with aa and ca until they have different letters at the beginning and at the 
end. Then we study how the arcs corresponding to these subwords intersect. So in our 
example we get caa and aac, implying a self-intersection point (Figure |5]TI))- 
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Appendix B. Background: Probability, Markov chains, weak convergence 

For the convenience of the reader we shall review some of the terminology of the subject 
here (All of this is standard, and can be found in most introductory textbooks, for instance, 
HI and 0.) 

A probability space is a measure space (f2, £>, P) with total mass 1. Integrals with respect 
to P are called expectations and denoted by the letter E, or by Ep if the dependence on 
P must be emphasized. A random variable is a measurable, real-valued function on 0; 
similarly, a random vector or a random sequence is a measurable function taking values in a 
vector space or sequence space. The distribution of a random variable, vector, or sequence 
X is the induced probability measure P o X^ 1 on the range of X. Most questions of in- 
terest in the subject concern the distributions of various random objects, so the particular 
probability space on which these objects are defined is usually not important; however, 
it is sometimes necessary to move to a "larger" probability space (e.g., a product space) 
to ensure that auxiliary random variables can be defined. This is the case, for instance, in 
sec.|6l where independent copies of a Markov chain are needed. 

Definition B.l. A sequence • • • , Xq,Xi, ... of £?— valued random variables defined 
on some probability space (X,B,P) is said to be a stationary Markov chain with station- 
ary distribution n and transition probabilities p(a, a') if for every finite sequence w = 
wqW\ ■ ■ ■ Wk of elements of Q and every integer m, 

k-l 

(46) P{X m+j = Wj for each < j < k} = tt(w ) p(wj,w j+1 ). 

If p(a, a') is a stochastic matrix on set Q and tt satisfies the stationarity condition 7r(a) = 
J2 a > ^( a ')p( a 'j a ) then there is a probability measure on the sequence space Q z under which 
the coordinate variables form a Markov chain with transition probabilities p(a, a') and 
stationary distribution ir. This follows from standard measure extension theorems - see, 
e.g., 11], sec. 1.8. 

Definition B.2. A sequence of random variables X n (not necessarily all defined on the 
same probability space) is said to converge weakly or in distribution to a limit distribution 
F on R (denoted by X n => F) if the distributions F n of X n converge to F in the weak 
topology on measures, that is, if for every bounded, continuous function ip : R — > R (or 
equivalently, for every continuous function ip with compact support), 

lim / ipdF n = / ipdF. 
n-+ooJ J 

as n — y oo. 

It is also customary to write F n F for this convergence, since it is really a property 
of the distributions. When the limit distribution F is the point mass 5q at we may some- 
times write X n instead of X n So. The weak topology on probability measures is 
metrizable; when necessary we will denote by g a suitable metric. It is an elementary fact 
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that weak convergence of probability distributions on R is equivalent to the pointwise 
convergence of the cumulative distribution functions at all points of continuity of the 
limit cumulative distribution function. Thus, Theorem 13.11 is equivalent to the assertion 
that the random variables (N(a) — n 2 K,)/n 3 / 2 on the probability spaces (F n , fi n ) converge 
in distribution to $V- 

We conclude with several elementary tools of weak convergence that will be used re- 
peatedly throughout the paper. First, given any countable family X n of random vari- 
ables, possibly defined on different probability spaces, there exist on the Lebesgue space 
([0, 1], Lebesgue) random variables Y n such that for each n the random variables X n and 
Y n have the same distribution. Furthermore, the random variables Y n can be constructed 
in such a way that if the random variables X n converge in distribution then the random 
variables Y n converge pointwise on [0, 1] (the converse is trivial). Next, define the total 
variation distance between two probability measures fi and v defined on a common mea- 
surable space (0, B) by 

ll/x — v\\tv = max(^(j4) — v(A)) 

where A ranges over all measurable subsets (events) of f2. Total variation distance is never 
increased by mapping, that is, if T : Q — > Q' is a measurable transformation then 

(47) ||// o T _1 — i/o T _1 \\tv < \\p- — ^Hrv- 

Also, if p, and v are mutually absolutely continuous, with Radon-Nikodym derivative 
dfi I dv, then 



(48) ||/i - v\\tv = \e v 



dfi 
dv 



It is easily seen that if a sequence of probability measures {fi n }n>i on M is Cauchy in total 
variation distance then the sequence converges in distribution. The following lemma is 
elementary: 

Lemma B.3. Let X n and Y n be two sequences of random variables, all defined on a common prob- 
ability space, let a n be a sequence of scalars, and fx r > 0. Denote by F n and G n the distributions 
of X n and Y n , respectively. Then the equivalence 

(49) YlL^H^F if and only if X "~ Q " ^F 

rf n r 

holds if either 

(50) (X n - Y n )/n r =^ or 

(51) H^n-Gnllrv ^0 
as n — > oo. Furthermore, ((51]) implies (|50)). 



The following lemma is an elementary consequence of Chebyshev's inequality and the 
definition of weak convergence. 
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Lemma B.4. Let X n be a sequence of random variables. Suppose that for every e > there exist 
random variables X £ and R £ n such that 

(52) X n = X*+I%, 

X £ n ==> Normal(0, a 2 ), and 
E\R £ n \ 2 < e. 

Then lim e ^o o\ := <r 2 > exists and is finite, and 

(53) X n =► Normal(0, a 2 ). 
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ABSTRACT. Oriented closed curves on an orientable surface with boundary are described 
up to continuous deformation by reduced cyclic words in the generators of the fundamen- 
tal group and their inverses. By self-intersection number one means the minimum number 
of transversal self-intersection points of representatives of the class. We prove that if a class 
is chosen at random from among all classes of m letters, then for large m the distribution 
of the self-intersection number approaches the Gaussian distribution. 
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1. Introduction 

Oriented closed curves in a surface with boundary are, up to continuous deformation, 
described by reduced cyclic words in a set of free generators of the fundamental group 
and their inverses. (Recall that such words represent the conjugacy classes of the funda- 
mental group.) Given a reduced cyclic word a, define the self-intersection number N(a) 
to be the minimum number of transversal double points among all closed curves repre- 
sented by a. (See Figured!) Fix a positive integer n and consider how the self-intersection 




FIGURE 1. Two representatives of aabb in the doubly punctured plane. 
The second curve has fewest self-intersections in its free homotopy class. 
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number N(a) varies over the population T n of all reduced cyclic words of length n. The 
value of N(a) can be as small as 0, but no larger than 0(n 2 ). See @, J5j for precise re- 
sults concerning the maximum of N(a) for a G F n , and [13) for sharp results on the 
related problem of determining the growth of the number of non self-intersecting closed 
geodesies up to a given length relative to a hyperbolic metric. 

For small values of n, using algorithms in 0, or [4] it is computationally feasible to 
compute the self -intersection counts N(a) for all words a G T n . Such computations show 
that, even for relatively small n, the distribution of N(a>) over F n is very nearly Gaussian. 
(See Figure |2l) The purpose of this paper is to prove that as n — > oo the distribution of 



3000000 



22=0000 



1=30000 



7 S 0000 





3 6 9 IS 15 18 2*i 24 27 3C 33 36 3s -2 ^5 ~B 51 54 57 60 63 66 69 72 75 73 31 3^ 37 90 



FIGURE 2. A histogram showing the distribution of self-intersection num- 
bers over all reduced cyclic words of length 19 in the doubly punctured 
plane. The horizontal coordinate shows the self-intersection count k; the 
vertical coordinate shows the number of cyclic reduced words for which 
the self-intersection number is k. 



N(a) over the population T n , suitably scaled, does indeed approach a Gaussian distribu- 
tion: 
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Main Theorem. Let E be an orientable, compact surface with boundary and negative Euler 
characteristic x, and set 

m X 2 2 2 X (2 X 2 - 2 X + 1) 

(1) k = Ky, = — and a = erf == — ; -777 

W 3(2 X -1) E 45(2 X -1) 2 ( X -1) 

Then for any a < bthe proportion of words a £ F n such that 

N(a) — nn 2 
a < n3/ 2 < b 

converges, as n — > 00, to 

rb 



1 f 

/ exp{-x 2 /2<r 2 } dx. 

V2na J a 



2ira 

Observe that the limiting variance a 2 is positive if the Euler characteristic is negative. 
Consequently, the theorem implies that (i) for most words a € T n the self-intersection 
number N(a) is to first order well-approximated by kti 2 ; and (ii) typical variations of 
N(a) from this first-order approximation ("fluctuations") are of size n 3 / 2 . 

It is relatively easy to understand (if not to prove) why the number of self -intersections 
of typical elements of F n should grow like n 2 . Following is a short heuristic argument: 
Consider the lift of a closed curve with minimal self -intersection number in its class to the 
universal cover of the surface S. This lift will cross n images of the fundamental polygon, 
where n is the corresponding word length, and these crossings can be used to partition 
the curve into n nonoverlapping segments in such a way that each segment makes one 
crossing of an image of the fundamental polygon. The self-intersection count for the curve 
is then the number of pairs of these segments whose images in the fundamental polygon 
cross. It is reasonable to guess that for typical classes a G T n (at least when n is large) 
these segments look like a random sample from the set of all such segments, and so the 
law of large numbers then implies that the number of self-intersections should grow like 
n 2 «//2 where k' is the probability that two randomly chosen segments across the funda- 
mental polygon will cross. The difficulty in making this argument precise, of course, is in 
quantifying the sense in which the segments of a typical closed curve look like a random 
sample of segments. The arguments below (see sec. H} wm make this clear. 

The mystery, then, is not why the mean number of self-intersections grows like n 2 , but 
rather why the size of typical fluctuations is of order n 3 / 2 and why the limit distribution 
is Gaussian. This seems to be connected to geometry. If the surface S is equipped with 
a finite-area Riemannian metric of negative curvature, and if the boundary components 
are (closed) geodesies then each free homotopy class contains a unique closed geodesic 
(except for the free homotopy classes corresponding to the punctures). It is therefore 
possible to order the free homotopy classes by the length of the geodesic representative. 
Fix L, and let Gl be the set of all free homotopy classes whose closed geodesies are of 
length < L. The main result of [11 J (see also [12]) describes the variation of the self- 
intersection count N(a) as a ranges over the population Gl- 
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Geometric Sampling Theorem. If the Riemannian metric on £ is hyperbolic (i.e., constant- 
curvature -1) then there exists a possibly degenerate probability distribution G onR such that for 
all a < b the proportion of words a G such that 

a< N(a ) + LV^ x)<b 
L 

converges, as L — > oo, to G(b) — G(a). 

The limit distribution is not known, but is almost certainly not Gaussian. The result 
leaves open the possibility (which we think unlikely) that the limit distribution is degen- 
erate (that is, concentrated at a single point); if this were the case, then the true order of 
magnitude of the fluctuations might be a fractional power of L. The Geometric Sampling 
Theorem implies that the typical variation in self-intersection count for a closed geodesic 
chosen randomly according to hyperbolic length is of order L. Together with the Main 
Theorem, this suggests that the much larger variations that occur when sampling by word 
length are (in some sense) due to ^/n— variations in hyperbolic length over the population 
T 

J n- 

The Main Theorem can be reformulated in probabilistic language as follows (see Ap- 
pendix [A] for definitions): 

Main Theorem*, Let £ be an orientable, compact surface with boundary and negative Euler 
characteristic %> an & ^ K an d & be defined by Q}. Let N n be the random variable obtained by 
evaluating the self-intersection function N at a randomly chosen a G T n . Then as n — >■ oo, 

, s N„ — n 2 n 

(2) Normal(0, 1) 

aw 3 ' z 

where Normal(0, 1) is the standard Gaussian distribution on E and =^ denotes convergence in 
distribution. 



2. Combinatorics of Self-Intersection Counts 



Our analysis is grounded on a purely combinatorial description of the self-intersection 
counts N(a), due to £3, 0, and EJ. 

Since £ has non-empty boundary, its fundamental group vri(S) is free. We will work 
with a generating set of vri(S) such that each element has a non-self-intersecting repre- 
sentative (Such a basis is a natural choice to describe self-intersections of free homotopy 
classes). Denote by Q the set containing the elements of the generating set and their in- 
verses and by g the cardinality of Q. Thus, g = 2 — 2\, where x denotes the Euler charac- 
teristic of E. It is not hard to see that there exists a (non-unique and possibly non-reduced) 
cyclic word O of length g such that 

(1) O contains each element of Q exactly once. 

(2) The surface E can be obtained as follows: Label the edges of a polygon with 2g sides, 
alternately (so every other edge is not labelled) with the letters of O and glue edges 
labeled with the same letter without creating Moebius bands. 
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This cyclic word O encodes the intersection and self-intersection structure of free homo- 
topy classes of curves on S. 

Since vri(S) is a free group, the elements of vri(S) can be identified with the reduced 
words (which we will also call strings) in the generators and their inverses. A string is 
joinable if each cyclic permutation of its letters is also a string, that is, if its last letter is not 
the inverse of its first. A reduced cyclic word (also called a necklace ) is an equivalence class 
of joinable strings, where two such strings are considered equivalent if each is a cyclic 
permutation of the other. Denote by S n , J n , and T n the sets of strings, joinable strings, 
and necklaces, respectively, of length n. Since necklaces correspond bijectively with the 
conjugacy classes of the fundamental group, the self-intersection count a h-» N(a) can be 
regarded as a function on the set T n of necklaces. This function pulls back to a function 
on the set J n of joinable strings, which we again denote by N(a), that is constant on 
equivalence classes. By [4 J this function has the form 

(3) N(a)= H{a i a,a i a), 

l<i<j<n 

where H = H (O) is a symmetric function with values in {0, 1} on J n x J n and a % a denotes 
the ith cyclic permutation of a. (Note: a 2 also denotes the limiting variance in (H), but it 
will be clear from the context which of the two meanings is in force.) 

To describe the function H in the representation 10, we must explain the cyclic ordering 
of letters. For a cyclic word a (not necessarily reduced), set o{a) = 1 if the letters of a 
occur in cyclic order in O, set o(a) = — 1 if the letters of a occur in reverse cyclic order, 
and set o(a) = otherwise. Consider two (finite or infinite) strings, oj = c\Oi ■ ■ ■ and 
oj' = d\&2 ■ ■ ■ . For each integer k > 2 define functions uu and vu of such pairs (u),uj') as 
follows: First, set Uk(u, oj') = unless 

(a) both oj and ui' are of length at least k; and 

(b) ci / d\, Cfe 7^ dk, and cj = dj for all 1 < j < k. 

For any pair (a>, oj') such that both (a) and (b) hold, define 

{1 if k = 2, and ofiid^dy) ^ 0; 
1 if k > 3, and o[c\d]_C2) = o(ckdkCk-i)) and 
otherwise. 

Finally, define 1/2(0;, u;') = for all strings a/, a/', and for k > 3 define V},(ui,ui') = unless 
both oj and oj' are of length at least k, in which case 

v k (oj,oj') = u k {cic 2 ■ ■ ■ c k ,d k d k -i ■ ■ ■ di). 

(Note: The only reason for defining v 2 is to avoid having to write separate sums for the 
functions Vj and Uj in formula © and the arguments to follow.) Observe that both u k 
and Vk depend only on the first k letters of their arguments. Furthermore, u k and v k 
are defined for arbitrary pairs of strings, finite or infinite; for doubly infinite sequences 



SELF-INTERSECTIONS IN COMBINATORIAL TOPOLOGY: STATISTICAL STRUCTURE 



7 



x = • • • x-ixoxi ■ ■ ■ we adopt the convention that 

itfc(x) = u k {xix 2 ■ ■ ■ x k ) and v k (x) = v k (xix 2 ■ ■ ■ x k ). 

Proposition 2.1. |4] Let a be a primitive necklace of length n > 2. Unhook a at an arbitrary 
location to obtain a string a* = a%a2 ■ ■ ■ a n , and let a^a* be the jth cyclic permutation of a*. 
Then 

n n n 

(4) N(a) = ^2Y1 o J '«*) + Vk^cf, o*a*)). 

i=l j=i+l k=2 

3. Proof of the Main Theorem: Strategy 

Except for the exact values |Q]) of the limiting constants k and a 2 , which of course de- 
pend on the specific form of the functions u k and v k , the conclusions of the Main Theorem 
hold more generally for random variables defined by sums of the form 

n n n 

(5) N(a*) =Y,J2 E^( ffV > ia *) 

i=l j=i+l fc=2 

where h k are real-valued functions on the space of reduced sequences a* with entries in 
Q satisfying the hypotheses (H0)-(H3) below. The function N extends to necklaces in an 
obvious way: for any necklace a of length n, unhook a at an arbitrary place to obtain 
a joinable string a*, then define N(a) = N(a*). Denote by \ n , \i n , and v n the uniform 
probability distributions on the sets J n , T n , and S n , respectively. 

(HO) Each function h k is symmetric. 

(HI) There exists C < oo such that \h k \ < C for all k > 1. 

(H2) For each k > 1 the function h k depends only on the first k entries of its arguments. 
(H3) There exist constants C < oo and < j3 < 1 such that for all n > k > 1 and 1 < i < n, 

E Xn \h k (a,a l a)\<Cp k 

In view of (H2), the function h k is well-defined for any pair of sequences, finite or infinite, 
provided their lengths are at least k. Hypotheses (H0)-(H2) are clearly satisfied for h k = 
u k +v k , where u k and v k are as in formula © and u\ = v\ = 0; see Lemma l4~8l of section l431 
for hypothesis (H3). 

Theorem 3.1. Assume that the functions h k satisfy hypotheses (H0)-(H3), and let N(a) be de- 
fined by ©/or all necklaces a of length n.Then there exist constants k and a 2 (given by equations 
(|22)l below) such that if F n is the distribution of the random variable (N(a) — n 2 K)/n 3 / 2 under 
the probability measure p n . Then for certain constants k£1 and a > 0, 

(6) F n ^Normal(0,cr 2 ). 

Formulas for the limiting constants k, a are given (in more general form) in Theorem l5.ll 
below. In section|6] we will show that in the case of particular interest, namely h k = u k + v k 
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where Uk,Vk are as in Proposition ^. 1[ the constants k and a defined in Theorem l5 . 1 l assume 
the values (Q} given in the statement of the Main Theorem. 

Modulo the proof of Lemma 14.81 and the calculation of the constants a and k, the 
Main Theorem follows directly from Theorem l3.ll The proof of Theorem 13. 1 1 will proceed 
roughly as follows. First we will prove (Lemma I4.2|) that there is a shift-invariant, Markov 
probability measure v on the space Soo of infinite sequences x = x\xi ■ ■ ■ whose marginals 
(that is, the push-forwards under the projection mappings to S n ) are the uniform distri- 
butions v n . Using this representation we will prove, in subsection !4.4[ that when n is large 
the distribution of N(a) under [i n differs negligibly from the distribution of a related ran- 
dom variable defined on the Markov chain with distribution v. See Proposition 14.71 for a 
precise statement. Theorem 13.11 will then follow from a general limit theorem for certain 
U-statistics of Markov chains (see Theorem 15 Jj . 

4. The Associated Markov Chain 

4.1. Necklaces, strings, and joinable strings. Recall that a string is a sequence with en- 
tries in the set Q of generators and their inverses such that no two adjacent entries are 
inverses. A finite string is joinable if its first and last entries are not inverses. The sets of 
length-n strings, joinable strings, and necklaces are denoted by S n ,J n , and T n , respec- 
tively, and the uniform distributions on these sets are denoted by u n , X n , and \i n . Let A 
be the involutive permutation matrix with rows and columns indexed by Q whose entries 
a(x, y) are 1 if x and y are inverses and otherwise. Let B be the matrix with all entries 1. 
Then for any n > 1, 

\S n \ = 1 T (B- A) n ~ l l and \J n \= trace(5 - A) n ~\ 

where 1 denotes the (column) vector all of where entries are 1. Similar formulas can be 
written for the number of strings (or joinable strings) with specified first and/ or last entry. 
The matrix B— A is a Perron-Frobenius matrix with lead eigenvalue (g— 1); this eigenvalue 
is simple, so both |«S n | and \J n \ grow at the precise exponential rate (g — 1), that is, there 
exist positive constants Cs = g/(g — 1) and Cj such that 

\S n \ ~ C s (g - l) n and \J n \~Cj(g-l) n . 

Every necklace of length n can be obtained by joining the ends of a joinable string, so 
there is a natural surjective mapping p n : J n — ^ F n . This mapping is nearly n to 1: In 
particular, no necklace has more than n pre-images, and the only necklaces that do not 
have exactly n pre-images are those which are periodic with some period d\ n smaller than 
n. The number of these exceptional necklaces is vanishingly small compared to the total 
number of necklaces. To see this, observe that the total number of strings of length n > 2 
is g(g — l) n ~ lm , hence, the number of joinable strings is between g(g — l) n ~ 2 and g(g — l) n . 
The number of length-n strings with period < n is bounded above by 

Y,9(9 - I)"" 1 < constant x (g - l)™/ 2 . 
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This is of smaller exponential order of magnitude than \J n \, so for large n most necklaces 
of length n will have exactly n pre-images under the projection p n . Consequently, as 

n — > oo 

\T n \~Cj{g-lT/n. 
More important, this implies the following. 

Lemma 4.1. Let \ n be the uniform probability distribution on the set J n , and let fi n o p^ 1 be the 
push-forward to J n of the uniform distribution on T n . Then 

(7) lim ||A n - p n op n -i\\ TV = 0. 



(8) p(a,b) 



Here || • \\tv denotes the total variation norm on measures - see the Appendix. By 
Lemma[A3]of the Appendix, it follows that the distributions of the random variable N(a) 
under the probability measures A n and p n are asymptotically indistinguishable. 

4.2. The associated Markov measure. The matrix (B — A) has the convenient feature that 
its row sums and column sums are all g — 1. Therefore, the matrix P := (B — A)/{g — 1) is 
a stochastic matrix, with entries 

9 if b a -1 , and 
otherwise, where 

(9) B = (g — l) -1 . 

In fact, P is doubly stochastic, that is, both its rows and columns sum to 1. Moreover, 
P is aperiodic and irreducible, that is, for some k > 1 (in this case k = 2) the entries 
of P fe are strictly positive. It is an elementary result of probability theory that for any 
aperiodic, irreducible, doubly stochastic matrix P on a finite set Q there exists a shift- 
invariant probability measure v on sequence space 5^, called a Markov measure, whose 
value on the cylinder set C(x±X2 ■ ■ ■ x n ) consisting of all sequences whose first n entries 
are X1X2 ■ ■ ■ x ri is 

. n—l 

(10) v(C{x xix 2 ■ ■ ■ x n )) = — Y\ p(Xi,X i+ l) 

i=l 

Any random sequence X = (X1X2 • • • ) valued in Soo, defined on any probability space 
P), whose distribution is v is called a stationary Markov chain with transition probability 
matrix P. In particular, the coordinate process on (5oo, v) is a Markov chain with t.p.m. P. 

Lemma 4.2. Let X = (X1X2 . . . ) be a stationary Markov chain with transition probability ma- 
trix P defined by Then for any n > 1 the distribution of the random string X\X 2 ■ ■ ■ X n is the 
uniform distribution v n on the set S n . 

Proof The transition probabilities p(a, b) take only two values, and 6, so for any n the 
nonzero cylinder probabilities ((T0|) are all the same. Hence, the distribution of X\X 2 • ■ ■ X n 
is the uniform distribution on the set of all strings £ = x\x 2 - • • x n such that the cylinder 
probability u(C(^)) is positive. These are precisely the strings of length n. ■ 
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4.3. Mixing properties of the Markov chain. Because the transition probability matrix P 
defined by 05) is aperiodic and irreducible, the m— step transition probabilities (the entries 
of the mth power P m of P) approach the stationary (uniform) distribution exponentially 
fast. The one-step transition probabilities 05) are simple enough that precise bounds can 
be given: 

Lemma 4.3. The m—step transition probabilities p m (a, b) of the Markov chain with I— step tran- 
sition probabilities 05) satisfy 



(11) 

where 9 = l/(g — 1). 



p m (a,b) - - 
9 



< 



Proof. Recall that P = 9(B — A) where B is the matrix with all entries 1 and A is an 
involutive permutation matrix. Hence, BA = AB = B and B 2 = gB = {{9 + 1)/9)B. This 
implies, by a routine induction argument, that for every integer m > 1, 

(9~ m 4- 1 \ 
^— jB-A if mis odd, and 

(9~ m — 1 \ 
jB + I if mis even. 

The inequality ((TTj) follows directly. ■ 

Following is a useful way to reformulate the exponential convergence (TTTj) . Let X = 
(Xj)j & z be a stationary Markov chain with transition probabilities 05). For any finite sub- 
set J C N, let Xj denote the restriction of X to the index set J, that is, 

Xj = (Xj) jeJ ; 

for example, if J is the interval [n] then Xj is just the random string X1X2 ■ ■ ■ X n . Denote 
by vj the distribution of Xj, viewed as a probability measure on the set Q J ; thus, for any 
subset F c Q J , 

(12) uj(F) = P{Xj e F}. 

If J, K are non-overlapping subsets of N, then v j\jk and vj x vk are both probability 
measures on Q JvjK , both with support set equal to the set of all restrictions of infinite 
strings. 

Lemma 4.4. Let J,K c N be two finite subsets such that max(J) + m < min(K) for some 
m > 1. Then on their common support set, 

1 ~ g9 m ~ x < dvj uK ^ l + g9 m - 1 
1 + g9 m ~ 1 ~ dvj x v K 



(13) z < JUA < 



where 9 = l/(g — 1) and da/dj3 denotes the Radon-Nikodym derivative ("likelihood ratio") of the 
probability measure a and f3. 
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Proof. It suffices to consider the special case where J and K are intervals, because the 
general case can be deduced by summing over excluded variables. Furthermore, because 
the Markov chain is stationary, the measures vj are invariant by translations (that is, 
vj+n = vj for any n > 1), so we may assume that J = [1, n] and K = [n + m, n + q]. 
Let xj\jk be the restriction of some infinite string to J U K; then 

(n-l \ / m+q-1 \ 

JJp(aij,Xj + i) 7r(« n+m ) Y\ P( x 3>Vj + i)\ and 
j=l J \j=n+m J 

(n—1 \ / m+q—1 

Y[p(Xj,X j + l) \ Pm{x n ,X n+m ) \ J [ p(Xj,X j + l) 
j=l J \j=n+m 

The result now follows directly from the double inequality |(TT]) . ■ 

4.4. From random joinable strings to random strings. Since J n C S n , the uniform distri- 
bution A n on J n is gotten by restricting the uniform distribution v n on <S n to J n and then 
renormalizing: 

Vn{Jn) 

Equivalently, the distribution of a random joinable string is the conditional distribution 
of a random string given that its first and last entries are not inverses. Our goal here 
is to show that the distributions of the random variable N(a) defined by 10 under the 
probability measures A n and v n differ negligibly when n is large. For this we will show 
first that the distributions under A n and v n , respectively, of the substring gotten by deleting 
the last n L l 2 ~ e letters are close in total variation distance; then we will show that changing 
the last ra 1 / 2-5 letters has only a small effect on the value of N(a). 

Lemma 4.5. Let X1X2 ■ ■ ■ X n be a random string of length n, and Y1Y2 ■ ■ - Y n a random joinable 
string. For any integer m G [l,n — 1] let u njTn and A„ jm denote the distributions of the random 
substrings XiX 2 ■ ■ ■ X n _ m and Y\fi ■ ■ ■ Y n - m . Then A n/m <C v n ^ m , and the Radon-Nikodym 
derivatives satisfy 

(u) i-gQ m < dKrn < i + gQ m 

l + gO m ~ du n , m ~ 1 - gO m 



where 9 = l/(g — 1). Consequently, 

(15) \\v n ,m — A n)m ||TV < 2 



1+. 



1 



Proof. Consider first the case m = 0. Recall that there are precisely g(g — l) n_1 strings 
of length n, and at least g(g — l) n ~ 2 {g — 2) of these are joinable. Hence, the likelihood 
ratio dXnfl/dunfi is bounded above by (g — l)/(g — 2) and below by 1. This proves (O for 
m = 0. The case m = 1 follows similarly. 
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The general case m > 2 follows from the exponential ergodicity estimates QT]) by an 
argument much like that used to prove Lemma l4~4l For any string x\x 2 ■ ■ ■ x n - m with 
initial letter x\ = a, 

^ n— m— 1 

V n ,m{xiX 2 ■ ■ ■ X n - m ) = - Y\ p(xi,X i+ i). 

Q 

y i=l 

Similarly by Lemma |4~2l 

Inequality (fTTb implies that the last fraction in this expression is between the bounding 
fractions in ([T4]) . The bound on the total variation distance between the two measures 
follows routinely. ■ 

Corollary 4.6. Let Xbea stationary Markov chain with transition probability matrix P. Assume 
that the functions hk satisfy hypotheses (H0)-(H3) ofsection\3\ Then for all k, i > 1, 

(16) ^(xyx)! <cp k 

Proof. The function %(x, r l x) is a function only of the coordinates x\x 2 ■ ■ • Xj+fc, and so for 
any joinable string X of length > i + k, 

h k (x,r l x) = /i fc (x,cr*x). 

By Lemma 14.51 the difference in total variation norm between the distributions of the 
substring x\x 2 ■ ■ ■ Xj+fc under the measures A n and v n converges to as n — > oo. Therefore, 

E\h k (XyX)\ = lim Exjh^^a)] < C? k . 



Now we are in a position to compare the distribution of the random variable N(a) 
under \x n with the distribution of a corresponding random variable N~ on the sequence 
space Soo under the measure v. The function is defined by 

n n oo 

d7) n*( X ) = y j £ ^*)- 

i=l j=i+l k=l 

Proposition 4.7. Assume that the functions satisfy hypotheses (H0)-(H3), and let k = YlkLi EHk 
Let F n be the distribution of the random variable (N(a) — r?K)jv?l 2 under the uniform proba- 
bility measure fj, n on T n , and G n the distribution of (N^(x) — Kn 2 )/n 3 / 2 under v. Then for any 
metric g that induces the topology of weak convergence on probability measures, 

(18) lim g(F n ,G n ) = 0. 

n— >oo 

Consequently, F n =>• $ CT if and only if G n =>■ 
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Proof. Let F' n be the distribution of the random variable (N(a) — n 2 K)/n 3 ^ 2 under the uni- 
form probability measure A n on J n . By Lemma l4~Tl the total variation distance between A n 
and }i n o p~ 1 is vanishingly small for large n. Hence, by Lemma [A.3I and the fact that total 
variable distance is never increased by mapping (cf . inequality ((45)) of the Appendix), 

lim g(F n , F^) = 0. 

n— ¥oo 

Therefore, it suffices to prove ((18)) with F n replaced by F' n . 

Partition the sums 10 and ((17)) as follows. Fix < 5 < 1/2 and set m = m{n) = [n s ]. By 
hypothesis (H3) and Corollary (|4.6)l , 

n n 

^E E E \hk(r^,r^)\ <Cn 2 r {n) and 

i=l J=»+1 k>m(n) 
n n 

E kY1 E E <Cn 2 /3 m W. 

i=l J=i+1 k>m(n) 

These upper bounds are rapidly decreasing in n. Hence, by Markov's inequality (i.e., 
the crude bound P{|y | > e} < E\Y\/e), the distributions of both of the sums converge 
weakly to as n — > oo. Thus, by Lemma [A.31 to prove the proposition it suffices to prove 
that 

lim q(f£,G£) = 

n— >-oo 

where F^ and are the distributions of the truncated sums obtained by replacing the 
inner sums in (0 and ((17)) by the sums over 1 < k < m(n). 

The outer sums in (0 and ((17)) are over pairs of indices 1 < i < j < n. Consider those 
pairs for which j > n — 2m(n): there are only 2nm(n) of these. Since nm(n) = 0(n 1+s ) 
and 5 < 1/2, and since each term in (0 and ((TT)) is bounded in absolute value by a constant 
C (by Hypothesis (HI)), the sum over those index pairs i < j with n — 2m(n) < j < n is 
o(n 3 / 2 ). Hence, by Lemma |A31 it suffices to prove that 

lim e (F£,G2) = Q 

n— >-oo 

where F^ and G^ are the distributions under A n and v of the sums (0 and ((17)) with the 
limits of summation changed to i < j < n—2m(n) and k < m(n). Now if i < j < n—2m(n) 
and k < m{n) then /ifc(r*x, r-'x) and h^^a^o 3 a) depend only on the first n — n{m) entries 
of x and a. Consequently, the distributions F^ and G^ are the distributions of the sums 

n-2m(n) n-2m(n) 

E E E m^x^x) 

i=l j=«+l fc<m(n) 

under the probability measures \ n m and Vn.mr 

respectively, where \ n ^ m and i/„ i?n are as 
defined in Lemma 14.51 But the total variation distance between A„ jm and v n ,m converges 
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to zero, by Lemma l4~5l Therefore, by the mapping principle (|45)> and Lemma [A.31 



4.5. Mean Estimates. In this section we show that the hypothesis (H3) is satisfied by the 
functions h k = u k + v k , where u k and v k are as in Proposition [27TJ 

Lemma 4.8. Let a 1 be the ith cyclic shift on the set J n of joinable sequences a. There exists 
C < oo such that for all 2 < k < n and < i < j < n, 

E Xn u k (a^,a^) < Ce k ' 2 and 
K ' E Xn v k (a^,a^) < C9 k l 2 . 

Proof. Because the measure A n is invariant under both cyclic shifts and the reversal func- 
tion, it suffices to prove the estimates only for the case where one of the indices i, j is 
0. If the proper choice is made (i = and j < n/2), then a necessary condition for 
itfc(a, a^a) ^ is that the strings a and a^a agree in their second through their (k — l)/2th 
slots. By routine counting arguments (as in section l4~T]) it can be shown that the number of 
joinable strings of length n with this property is bounded above by C(g — i) n ~ fc / 2 / where 
C < oo is a constant independent of both n and k < n. This proves the first inequality. A 
similar argument proves the second. ■ 



5. U-Statistics of Markov chains 

Proposition 14 . 71 implies that for large n the distribution F n considered in Theorem B.ll is 
close in the weak topology to the distribution G n of the random variable defined by 
(fLZ)> under the Markov measure v. Consequently, if it can be shown that G n =4> Q a then 
the conclusion F n =^ & a will follow, by Lemma IA.3I of the Appendix. This will prove 
Theorem [3TTJ 

Random variables of the form ((TT) are known generically in probability theory as U- 
statistics (see |9J). Second order [/—statistics of Markov chains are defined as follows. Let 
Z = Z1Z2 ■ ■ ■ be a stationary, aperiodic, irreducible Markov chain on a finite state space 
A with transition probability matrix Q and stationary distribution ir. Let r be the forward 
shift on sequence space „4 N . The [/—statistics of order 2 with kernel h : „4 N x — > E are 
the random variables 

n n 
i=l 

The Hoeffding projection of a kernel h is the function H : „4 N — > M. defined by 



H(z) = Eh(z,Z). 
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Theorem 5.1. Suppose that h = YlkLi^k where {h k } k >i is a sequence of kernels satisfying 
hypotheses (H0)-(H2) and the following: There exist constants C < oo and < f3 < 1 such that 
for all k,i > 1, 

(20) E\h k {zyz)\<cp k . 

Then as n — )> oo, 

W n -n 2 K 

(2D =► 
zf/zere £?ze constants k and a 2 are 

oo 1 / n oo \ 2 

(22) K = S^EH k (Z) and a 2 = lim -£ ( VV H k {r l Z) - uk ) . 

— ' n->co n \ — ' — ' / 

k=l \i=l k=l / 

There are similar theorems in the literature, but all require some degree of additional 
continuity of the kernel h. In the special case where all but finitely many of the functions 
h k are identically the result is a special case of Theorem 1 of |8J or Theorem 2 of 110]. 
If the functions h k satisfy the stronger hypothesis that \h k \ < Cj3 k pointwise then the 
result follows (with some work) from Theorem 2 of [10 J. Unfortunately, the special case 
of interest to us, where h k = u k + v k and u k , v k are the functions defined in sec.|2l does not 
satisfy this hypothesis. 

The rest of section |5] is devoted to the proof. The main step is to reduce the problem 
to the special case where all but finitely many of the functions h k are identically by 
approximation; this is where the hypothesis (|20)) will be used. The special case, as already 
noted, can be deduced from the results of |8] or [10], but instead we shall give a short and 
elementary argument. 

In proving Theorem 15.11 we can assume that all of the Hoeffding projections H k have 
mean 

EH k (Z) = 0, 

because subtracting a constant from both h and k has no effect on the validity of the 
theorem. Note that this does not imply that Eh k (T l Z, r J Z) = 0, but it does imply (by 
Fubini's theorem) that if Z and Z' are independent copies of the Markov chain then 

Eh k (Z,Z') = 0. 

5.1. Proof in the special case. If all but finitely many of the functions h k are then for 
some finite value of K the kernel h depends only on the first K entries of its arguments. 

Lemma 5.2. Without loss of generality, we can assume that K = 1. 

Proof. If Z1Z2 ■ ■ ■ is a stationary Markov chain, then so is the sequence Z^Z^ ■ ■ ■ where 

= ZiZi + \ ■ ■ ■ Zi+x 

is the length-(i<C + 1) word obtained by concatenating the K + 1 states of the original 
Markov chain following Zi. Hence, the [/—statistics W n can be represented as [/—statistics 
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on a different Markov chain with kernel depending only on the first entries of its argu- 
ments. It is routine to check that the constants k and a 1 defined by ((22)) for the chain 
equal those defined by ((22]) for the original chain. ■ 

Assume now that h depends only on the first entries of its arguments. Then the Hoeffd- 
ing projection H also depends only on the first entry of its argument, and can be written 

as 

H(z) = Eh(z, Z x ) = h ( z , z')tt(z'). 
z'eA 

Since the Markov chain Z n stationary and ergodic, the covariances EH{Zi)H{Zi +n ) = 
EH(Zx)H(Zx+ n ) decay exponentially in n, so the limit 

(23) ^limi^gtf^)) 

exists and is nonnegative. It is an elementary fact that a 2 > unless H = 0. Say that the 
kernel h is centered if this is the case. If h is not centered then the adjusted kernel 

(24) h*(z, z') := h(z, z') - H(z) - H(z') 

is centered, because its Hoeffding projection satisfies 

H*(z) : = Eh*(z,Zx) 

= Eh(z, Zx) - EH(z) - EH(Z{) 
= H{z) - H{z) -0. 

Define 

ran ra 

T n = Y,Y. h ^ Z i) and D n = Y J KZi,Z l ); 

i=X j=l i=l 

then since the kernel h is symmetric, 

(25) W n = \{T n - AO- 
Proposition 5.3. Ifh is centered, then 

(26) T n /n =► Q 

where Q is a quadratic form in no more than m = \Z\ independent, standard normal random 
variables. 

Proof. Consider the linear operator Lh on £ 2 (Z, tt) defined by 

L h f(z) := ^ h(z,z')f(z')ir(z'). 
z'ez 
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This operator is symmetric (real Hermitean), and consequently has a complete set of or- 
thonormal real eigenvectors ifj(z) with real eigenvalues Xj. Since h is centered, the con- 
stant function <pi := 1/y/m is an eigenvector with eigenvalue Ai = 0; therefore, all of the 
other eigenvectors ipj, being orthogonal to tpi, must have mean zero. Hence, since Ai = 0, 

m 

h(z,z) = ^2x j (p j (z)(p j (z), 
i=2 

and so 

m n n 

(27) T n = J2 Xk ^2Yl VkiZijVkiZj) 

k=2 i=l j=l 
in / n \ ^ 

k=2 \i=l J 

Since each ipk has mean zero and variance 1 relative to ir, the central limit theorem for 
Markov chains implies that as n — > oo, 



(28) 



1 n 

— J2<p k {Zi)^ Normal(0, a 2 k ), 



with limiting variances o\ > 0. In fact, these normalized sums converge jointl$\ (for 
k = 2, 3, . . . , m) to a multivariate normal distribution with marginal variances o\ > 0. 
The result therefore follows from the spectral representation (|27)) . ■ 

Corollary 5.4. If h is not centered, then with a 2 > as defined in (j23]) , 
(29) W n /n 3/2 => Normal(0, a 2 ). 



Proof. Recall that W n = (T n - D n )/2. By the ergodic theorem, Hindoo D n /n = Eh{Z u Z 1 ) 
almost surely, so D n jr?l 2 => 0. Hence, by Lemma lA31 of the Appendix, it suffices to prove 
that if h is not centered then 

(30) T n /n 3/2 Normal(0, 4<r 2 ) 



Note, however, that the normalized sums in l |28t need not be asymptotically independent for different k, 
despite the fact that the different functions ipk are uncorrelated relative to w. This is because the arguments Z t 
are serially correlated: in particular, even though ipk{Zi) and ipi(Zi) are uncorrelated, the random variables 
ipk(Zi) and ipi {Z i+1 ) might well be correlated. 
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Define the centered kernel h* as in ((24|) . Since the Hoeffding projection of H* is identi- 
cally 0, 

n 

T n = T* + 2nY J H{Z i ) where 

8=1 

n n 

i=i i=i 

Proposition 15.31 implies that T* jn converges in distribution, and it follows that T* / n 3 / 2 
converges to in distribution. On the other hand, the central limit theorem for Markov 
chains implies that 



- 3 / 2 j 2nJ2H(Zi) J Normal(0,4fj 2 ), 



i=l 



with a 2 > 0, since by hypothesis the kernel h is not centered. The weak convergence ((30|) 
now follows by Lemma |A31 ■ 

5.2. Variance/covariance bounds. To prove Theorem l5.1l in the general case we will show 
that truncation of the kernel h, that is, replacing h = YlkLi by h K = Ylk=i ^k> nas only 
a small effect on the distributions of the normalized random variables W n /n 3 / 2 when K 
is large. For this we will use second moment bounds. To deduce these from the first- 
moment hypothesis (j20|) we shall appeal to the fact that any aperiodic, irreducible, finite- 
state Markov chain is exponentially mixing. Exponential mixing is expressed in the same 
manner as for the Markov chain considered in section l4~3l For any finite subset J C N, 
let Zj = (Zj)j(zj denote the restriction of Z to the index set J, and denote by \ij the 
distribution of Zj. If /, J are nonoverlapping subsets of N then both fj,j u j and m x fij 
are probability measures supported by A IUJ . If the distance between the sets I and J is 
at least m*, where is the smallest integer such that all entries of Q m * are positive, then 
Hiuj and hi x fij are mutually absolutely continuous. 

Lemma 5.5. There exist constants C < oo and < 5 < 1 such that for any two subsets I, J C N 
satisfying min(J) — max(J) = m > m„ 

1 - C5 m < dfiIUJ < 1 + C5 m . 

dm x hj 

The proof is nearly identical to that of Lemma 14.41 except that the exponential conver- 
gence bounds of Lemma 1431 must be replaced by corresponding bounds for the transition 
probabilities of Z. The corresponding bounds are gotten from the Perron-Frobenius theo- 
rem. 

For any two random variables U, V denote by cov(U, V) = E{UV) — EUEV their co- 
variance. (When U = V the covariance cov(U, V) = Var (£/").) 
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Lemma 5.6. For any two pairs i < j and i' < f of indices, let A = A(i, i r ,j,f) be the distance 
between the sets {i,j} and (that is, the minimum distance between one ofi,j and one of 

i',f). Then for all A > max(fc, k') + m*, 

(31) \cOv(h k (T%^Z),h kl (T i 'Z,T^Z))\<Cp k + k '- A Q A _ m ^ {k!kl) 

where 

( 1 + C5 m - 1 \ 5 

= [ i-csm-i ) ~ 1 for m - m *- 

Remark 5.7. What is important is that the covariances decay exponentially in both k + k! 
and A; the rates will not matter. When A < max(fc, k') + m* the bounds (|3T~|> do not apply. 
However, in this case, since the functions h k are bounded above in absolute value by a 
constant C < oo independent of k (hypothesis (HI)), the Cauchy-Schwartz inequality 
implies 

|cov(/» fc (T i Z,7JZ) J V(r i 'Z,TJ"Z))| 2 = (Eh k (r i ZyZ)h k ,(i A 'Z, T^'Z))) 2 

< (Eh k (r% r^Z)Eh kl (r i 'z, r^Z))) 2 

< C 2 Eh k {T% T j Z)Eh k ,(T i 'Z, T j 'Z)) 

the last by the first moment hypothesis ((20|) . 

Proof of Lemma\5^ Since the random variables % are functions only of the first k letters 
of their arguments, the covariances can be calculated by averaging against the measures 
Vjuk, where 

J = [i, i + k] U [j, j + k] and K = [i',i' + k'\ U [j',j + k'] . 

The simplest case is where j : + k < i'; in this case the result of Lemma 1531 applies directly, 
because the sets J and K are separated by m = A — k. Since the functions h k are uniformly 
bounded, Lemma 1531 implies 

1 _ C / 3 m-i Eh k (T i Z,TiZ)h k ,(T i 'Z,Tj'Z) l + Cp™- 1 
1 + Cj3 m - 1 ~ Eh k (T i Z,TiZ)Eh k/ (T*'Z,Ti'Z) ~ 1 - C/3" 1 - 1 ' 

The inequalities in d3"TT ) now follow, by the assumption ( |20l) . (In this special case the 
bounds obtained are tighter that those in ((31) .) 

The other cases are similar, but the exponential ergodicity estimate ((13)) must be used 
indirectly, since the index sets J and K need not be ordered as required by Lemma 14.41 
Consider, for definiteness, the case where 

i + k<i'<i' + k'<j<j + k<j'<j' + k' . 
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To bound the relevant likelihood ratio in this case, use the factorization 

d^JyjK djljyjK 

d/ij x p K dfj,j- UK - x n J+{ j K + 

dHj-uK- x HJ+UK+ 



dnj- x n K - x fi J+ x fj, K + 
x d\ij- x n K - x fij+ x n K+ 
dfij x \i K 

where J- = [i,i + k], J+ = + k], K~ = + k'], and = [j',f + fc']. For 
the second and third factors, use the fact that Radon-Nikodym derivatives of product 
measures factor, e.g., 

dp j- X y K - X » J+ X } = J- X /i J+ x d/Xjc- X ^ + 

Now Lemma [5.51 can be used to bound each of the resulting five factors. This yields the 
following bounds: 



and so by the same reasoning as used earlier, 

l + C/S" 1 - 1 ^ 5 £;/i & (r i Z,r^'Z)/i i ./(r i 'Z,r^Z) < ^l + C/3 m - 1 



l-C/3 m -V ~ Eh k (T i Z,TiZ)Eh k >(T i 'Z,Ti'Z) ~ \l-CI3 m ~ l 
The remaining cases can be handled in the same manner. 
Corollary 5.8. T/iere exz'sf C, C" < oo swc/i that for alln > 1 and all 1 < K < L < oo, 

(n n L \ oo oo 

E E Em^x^z) <c^ 3 E E {(fc' + ^ + c)/?^'}. 
i=i j=i+i y fc=_ft: fc'=x 

Consequently, for any e > f/zere ex/sfs K < oo such that for all n > 1, 



(33) Var ( £ £ £ /*(<r i Z,T>Z) ) < 

i=l j=i+lfe=K+l 



en 3 . 



Proof. The variance is gotten by summing the covariances of all possible pairs of terms in 
the sum. Group these by size, according to the value of A(i, for any given value 

of A > 2, the number of quadruples in the range [n] with A(i,i',j, j') = A is 

no greater than 24n 3 . For each such quadruple and any pair k, k' such that K < k < k' 
Lemma I5T61 implies that if A > k + then 



\cov(h k (T l Z,T^Z),h k/ (T l 'Z,T^Z))\ < C/3 k+k 'g A - k 
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If A < m* + k' then the crude Cauchy-Schwartz bounds of Remar k |5 . 71 imp ly that 

|cov(/ lfc (r i Z,^Z),/ lfc ,(r i 'z,^"z))| < Cp k+k ' 

where C < oo is a constant independent of i, j', k, k' . Summing these bounds we find 
that the variance on the left side of ((32]) is bounded by 

L L oo 
k=Kk=K A=m« 

Since Qj is exponentially decaying in j, the inner sum is finite. This proves the inequality 
The second assertion now follows. ■ 



5.3. Proof of Theorem |5.1[ Given Corollary 15.81 — in particular, the assertion ((33j> — The- 
orem 15.11 follows from the special case where all but finitely many of the functions h k 
are identically zero, by Lemma IA.4I of the Appendix. To see this, observe that under the 
hypotheses of Theorem l5.1[ the random variable W n can be partitioned as 

W n = WK + R* 

where 

n n K n n oo 

^ = EEE^V'Z) and tff = EE E h k (r%riZ). 

i=l j=i+l k=l i=l j=i+l k=K+l 

By Proposition l5.3l and Corollary 15.41 for any finite K the sequence jr?l 2 converges to 
a normal distribution with mean and finite (but possibly zero) variance a\. By (|33]) , for 
any e > there exists K < oo such that E\R^\ 2 /n 3 < e for all n > 1. Consequently, by 
Lemma [A .4[ a 2 = Huik^-oo cr 2 K exists and is finite, and 

W n /n 3/2 => Normal(0,tJ 2 ). 

6. Mean /Variance Calculations 

In this section we verify that in the special case h k = Uk + v k , where Uk and Vk are the 
functions defined in section|2]and hi = 0, the constants k and a 2 defined by j22l ) coincide 
with the values Q. 

Assume throughout this section that X = XiX% . . . and X' = X' X X' 2 - • • are two in- 
dependent stationary Markov chains with transition probabilities ©, both defined on a 
probability space (O, P) with corresponding expectation operator E. Set h k = Uk + Vk- For 
each fixed (nonrandom) string x±X2 • • • of length > k define Hk = Uk + Vk where 

(34) U k {xix 2 ■■■) = Eu k {xix 2 ■ ■ ■ x k , X[X' 2 ■■■), 

V k (x\X2---) = Ev k {xix 2 ■ ■ -Xk,X[X' 2 ■ ■ ■ ), and 

K 

S k (xix 2 ■■■) = U 2 (xix 2 ) + }XUi + Vi)(x 1 x 2 ■ ■ ■ x k ). 

1=3 
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The restrictions of U k and V k to the space of infinite sequences are the Hoeffding projections 
of the functions Uk and v k (see section [3]). Note that each of the functions U^,Vk, Hk de- 
pends only on the first k letters of the string x\x 2 ■■■ . By equations l|22]l of Theorem 15. 1[ 
the limit constants k and a 2 are 



oo 1 / n oo 

y^EH k (X) and cr 2 = lim -£ ( V V Hd^X) 

— ' n->oo n \ 

fc=2 \i=l fc=2 



UK 



We will prove (Corollary l6.4|l that in the particular case of interest here, where h k = Uk+Vk, 
the random variables Hk{T % ~K) and flfc/(r*'X) are uncorrelated unless i = i' and k = k' . It 
then follows that the terms of the sequence defining a 2 are all equal, and so 

oo 

a 2 = ^Var(tf fe (X)). 

k=2 

Lemma 6.1. For each string x\x 2 ■ ■ ■ x k of length k > 2 and each index i < k — 1, define 
ji = ji{x\X2 ■ ■ ■ Xk) to be the number of letters between x,i and xi + \ in the reference word O in the 
clockwise direction (see Figure^. Then 



(35) V 2 = and U k = V k for all k > 3, 
and 

(36) U k (XlX 2 - ■ ■ ) = — 7TTTT 

where t(a, b) = a{g — 2 — b) + b(g — 2 — a). 
Therefore, 

(37) Sk{xiX2 --- ) = ^T) + %^W^ 



^ *- — 3>i 

Ji(xiX 2 ...XiXj + l...) 



) 



v s 

FIGURE 3. The interval of length j, in O 

Proof. The Markov chain with transition probabilities ((HI) is reversible (the transition prob- 
ability matrix |(8]l is symmetric), and the transition probabilities are unchanged by inver- 
sion a h-> a and a'i-^af. Hence, the random strings X[X' 2 - ■ ■ Xi. and X' k X' k _ l ■ ■ ■ X[ have 
the same distribution. It follows that for each k > 2, 

U k {xix 2 ■■■) = Vk{xix 2 ■■■)■ 
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Consider the case k = 2. In order that U2(x\X2, X[X 2 ) ^ it is necessary and sufficient 
that the letters x\X' x X2X' 2 occur in cyclic order (either clockwise or counterclockwise) in 
the reference word O. For clockwise cyclic ordering, the letter X[ must be one of the j\ 
letters between x\ and x%, and X' 2 must be one of the g — 2 — j\ letters between X2 and 
x,\. Similarly for counterclockwise cyclic ordering, X[ must be one of the g — 2 — j\ letters 
between X2 and x\, and X2 one of the j\ letters between x\ and x<i- But X[, and hence 
also its inverse X[, is uniformly distributed on the g letters, and given the value of X[ the 
random variable X' 2 is uniformly distributed on the remaining (g—)l letters. Therefore, 

U 2 {xiX 2 ) ~ 



9(9 ~ 1) 



The case k > 3 is similar. In order that Uk{x\X2 • • • , X[X 2 • • • ) be nonzero it is neces- 
sary and sufficient that the strings X1X2 ■ ■ ■ Xk and X[X' 2 ■ ■ ■ X' k differ precisely in the first 
and kth entries, and that the letters x\X[x2 occur in the same cyclic order as the letters 
XkX' k Xk-i- This order will be clockwise if and only if X[ is one of the j\ letters between x\ 
and X2 and X' k is one of the g — 2 — j^-i letters between xt and Xk-i- The order will be 
counterclockwise if and only if X[ is one of the g — 2 — j\ letters between X2 and X2 and X' k is 
one of the jk-\ letters between Xk-\ and Xk- Observe that all of these possible choices will 
lead to reduced words X' x X2Xj, ■ ■ ■ Xk-\X' k . By (8), the probability of one of these events 
occurring is 

U k { Xl X2---Xk) = -J—- TTfcZT - 



For i = 1, 2, ... , define J; = ji(X\X2 • • • ) to be the random variable obtained by eval- 
uating the function ji at a random string generated by the Markov chain, that is, Jj is the 
number of letters between Xi and Xi + \ in the reference word O in the clockwise direc- 
tion. Because is obtained by randomly choosing one of the letters of Q other than 
Xi, the random variable J; is independent of JQ. Since these random choices are all made 
independently, the following is true: 

Lemma 6.2. The random variables Xi, Ji, J2, . . . are mutually independent, and each Ji has the 
uniform distribution on the set {0, 1, 2, . . . , g — 2}. Consequently, 



EJ t 


= G?-2)/2, 




EJf 


= (g-2)(2g 


- 3)/6, 


Ejf 


= {9-2?(g 


- l)/4, 


EJf 


= (g-2)(2g 


-3)(3 5 2 -9 5 + 5)/30 


EJiJi' 


= E JiE Ji> = 


(g - 2) 2 /4 for 1 + i' 
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By Lemma I67T1 the conditional expectations U k , V k are quadratic functions of the cycle 
gaps Ji , J 2 , Consequently the unconditional expectations 

Eu k (X,X') = EU k {X) 

can be deduced from the elementary formulas of Lemma \62\ by linearity of expectation. 
Consider first the case k > 3: 

g(g - if^EU^X) = Et{J 1} J fc _i) 

= 2EJ 1 (g-2- J k _i) 
= 2{g-2)EJ 1 -2EJ 1 J k _ l 
= (g-2) 2 -(g-2) 2 /2 
= {g-2?/2. 

For k = 2: 

g(g-l)EU 2 (X) = Et(J 1 ,J 1 ) 

= 2EJi(g - 2 - Ji) 

= 2(g - 2)EJ 1 - 2EJ\ 

= (g-2) 2 -( 5 -2)(2 5 -3)/3 

= ( 5 -2)( 3 -3)/3. 

Corollary 6.3. If X = X1X2 . . . flnrf X' = X^X^ ■ ■ ■ are independent realizations of the 
stationary Markov chain with transition probabilities (8]), then 

(39) Eu 2 (X,X') 

(40) Eu k (X,X') 

(41) ^(X) 

The variances and covariances of the random variables U k (X) can be calculated in sim- 
ilar fashion, using the independence of the cycle gaps J k and the moment formulas in 
Lemma [6^21 It is easier to work with the scaled variables t(J\, J k ) = g(g — l) k U k +\ rather 
than with the variables U k , and for convenience we will write = g — 2 — J^. Note 
that by definition and Lemma 16.21 the random variables Ji and Jf 1 both have the same 
distribution (uniform on the set {0, 1, . . . , g — 2}), and therefore also the same moments. 

Case 0: If i, j, k, m are distinct, or if i = j and i, k, m are distinct, then 

Et(Ji, Jj)t(J k , J m ) = Et(Ji, Jj)Et(J k , J m ), 



EU 2 {X) 
EU k (X) 



(g-2)(g-3) 

3g(g - 1) ' 

2gl 1^ ' aM 



Eu 2 (X, X') + J2 E ( u k + t> fc )(X, X') 



k=3 



3(5-1)' 
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since the random variables «/«, Jj, Jk, J m (or in the second case Ji,Jk, J m ) are independent. 
It follows that for any indices i,j,k,m such that and i + k / j + m, the random variables 
J7fe(r*X) and f7 m (r J 'X) are uncorrelated. (Here, as usual, r is the forward shift operator.) 

Case 1: If i, k, m > 1 are distinct then 

Et(Ji, J k )t(Ji, J m ) = E(Jijj? + J k J R ){JiJ R + J m J R ) 

= EJiJ^JiJ^ + EJiJ^J^Jm + EJ R J k JiJ R + EJ R J k jfj m 
= ((g - 2) 2 /4)(£J 2 + EJijf + Ejfj; + Ejfjf) 
= (( 5 - 2) 2 /4)£( + J*) 2 

= (5 - 2) 4 /4 

= Et(Ji, Jk)Et(Ji, J m ). 

Thus, the random variables t(Ji,J k ) and t(Ji,J m ) are uncorrelated. Consequently, for all 
choices of i, j, k > 1 such that j / k, the random variables C/j(r l X) and t/ m (r J X) are 
uncorrelated. 

Case 2:lii^k then 

Et(Ji, Ji)t(Ji, J k ) = EJ^f JiJk + EJi-jf jf J k + EJ^JiJijJ} + Ejfjijfjk 
= ((g - 2)/2)(2EJ i J i J l R + 2EJ i J R J R ) 
= 2(g-2)(EJ i J i (g-2-J i )) 
= 2(g-2)((g-2)EJ 2 -Ejf) 
= ( g -2f(g-3)/6 
= Et(Ji, Jk)Et(Ji, Ji) 

Once again, the two random variables are uncorrelated. It follows that for all i > 1 and 
m > 3 the random variables [^(t'X) and U m (T l X.) are uncorrelated. 

Case 3: If k > 2 then 

Et(Ji, Jk) 2 = EJ\J\J k J k + EJiJ R JkJk + 2EJ R J\J R Jk 
= 2(EJl) 2 + 2{EJiJ^) 2 
= {g- 2) 2 (2g - 3) 2 /18 + («? - 2) 2 ( 5 - 3) 2 /18 

and so 

var(t(Ji, J k )) = Et{Ji, Jk) 2 - (Et(J u J k )) 2 

= (g- 2) 2 (2g - 3) 2 /18 + (g - 2) 2 (g - 3) 2 /18 - (g - 2) 4 /4 
= 5 2 (<?-2) 2 /36. 
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Case 4: When k = 1: 

Et(J 1: Jif = AEJxJxJ^J? 

= 4((g - 2) 2 Ejf - 2(g - 2)Ejf + Ejf) 
= 2(g-2)(g-3)(g 2 -4g + 5)/15, 

so 

var(t(Ji, Ji)) = M(Ji, Ji) 2 - (M(Ji, Ji)) 2 

= 2( 5 - 2)(s - 3)(g 2 - 4g + 5)/15 - (g - 2) 2 (g - 3) 2 /9 
= g(g-2)(g-3)(g + l)/45, 

This proves: 

Corollary 6.4. Tfoe random variables C4(t j X), where i > and k > 2, are uncorr elated, and 
have variances 

Var([/ fc (r*X)) = — /or fc > 3, 



(42) 

Consequently, 
(43) 



Var(^ 2 (r l X)) 



36( 5 - l) 2fc ~ 2 
(g-2)(g-3)(g + l) 



45g(g - l) 2 

oo 

Var(5 oc (r i X)) = Var(£/ 2 (X)) + ^ Var(2C/ fc (X)) 



fe=3 



Var(C/ 2 (X))+ lim V Var(2C4(X)) 



fc=3 



+ 



9 9 (5 " I) 2 



(g-2)(g-3)(g + l) 
45g(g - l) 2 
= (g-2)(g 2 -2g + 2) 

45g(g - l) 2 
_ 2 X (2 X 2 -2 X + 1) 
45(2 X -l) 2 (x-l) 

Appendix A. Background: Probability, Markov chains, weak convergence 

For the convenience of the reader we shall review some of the terminology of the subject 
here (All of this is standard, and can be found in most introductory textbooks, for instance, 
13 and Q.) 

A probability space is a measure space (Q, B, P) with total mass 1. Integrals with respect 
to P are called expectations and denoted by the letter E, or by Ep if the dependence on 
P must be emphasized. A random variable is a measurable, real-valued function on Q,; 
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similarly, a random vector or a random sequence is a measurable function taking values in a 
vector space or sequence space. The distribution of a random variable, vector, or sequence 
X is the induced probability measure P o X^ 1 on the range of X. Most questions of in- 
terest in the subject concern the distributions of various random objects, so the particular 
probability space on which these objects are defined is usually not important; however, 
it is sometimes necessary to move to a "larger" probability space (e.g., a product space) 
to ensure that auxiliary random variables can be defined. This is the case, for instance, in 
sec.|6l where independent copies of a Markov chain are needed. 

Definition A.l. A sequence • • • , X-x,Xq,Xi, ... of valued random variables defined 
on some probability space (X,B,P) is said to be a stationary Markov chain with station- 
ary distribution it and transition probabilities p(a, a') if for every finite sequence w = 
wqWi ■ ■ ■ Wk of elements of Q and every integer m, 

fc-i 

(44) P{X m+j = yjj for each < j < k} = tt(w ) \\ P(.Wj,w j+1 ). 

j=o 

lip(a, a') is a stochastic matrix on set Q and tt satisfies the stationarity condition it(a) = 
J2 a > 7T ( a ')p( a 'i a ) then there is a probability measure on the sequence space Q z under which 
the coordinate variables form a Markov chain with transition probabilities p(a, a') and 
stationary distribution 7r. This follows from standard measure extension theorems - see, 
e.g., [2], sec. 1.8. 

Definition A.2. A sequence of random variables X n (not necessarily all defined on the 
same probability space) is said to converge weakly or in distribution to a limit distribution 
FonM (denoted by X n =^ F) if the distributions F n of X n converge to F in the weak 
topology on measures, that is, if for every bounded, continuous function ip : R — > R (or 
equivalently, for every continuous function ip with compact support), 

lim / ipdF n = / (pdF. 

n^ooj J 

as n — y oo. 

It is also customary to write F n => F for this convergence, since it is really a property 
of the distributions. When the limit distribution F is the point mass 5q at we may some- 
times write X n =^ instead of X n =^ 5q. The weak topology on probability measures is 
metrizable; when necessary we will denote by g a suitable metric. It is an elementary fact 
that weak convergence of probability distributions on R is equivalent to the pointwise 
convergence of the cumulative distribution functions at all points of continuity of the 
limit cumulative distribution function. Thus, Theorem 13.11 is equivalent to the assertion 
that the random variables (N(a) — n 2 K)/n 3 / 2 on the probability spaces (F n , fi n ) converge 
in distribution to & a . 

We conclude with several elementary tools of weak convergence that will be used re- 
peatedly throughout the paper. First, given any countable family X n of random vari- 
ables, possibly defined on different probability spaces, there exist on the Lebesgue space 
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([0, 1], Lebesgue) random variables Y n such that for each n the random variables X n and 
Y n have the same distribution. Furthermore, the random variables Y n can be constructed 
in such a way that if the random variables X n converge in distribution then the random 
variables Y n converge pointwise on [0, 1] (the converse is trivial). Next, define the total 
variation distance between two probability measures p and v defined on a common mea- 
surable space (17, B) by 

1 1 A 4 ~~ u \\tv = max(/i(A) — v(A)) 
where A ranges over all measurable subsets (events) of 17. Total variation distance is never 
increased by mapping, that is, if T : 12 — > 17' is a measurable transformation then 

(45) ||/x o T" 1 — i/o T _1 \\tv < II/- 1 ~~ HItv- 

Also, if \i and v are mutually absolutely continuous, with Radon-Nikodym derivative 
dfj, I dv, then 



(46) || A* - v \\tv = 



d£ _ 1 
dv 



It is easily seen that if a sequence of probability measures {fi n }n>i on R is Cauchy in total 
variation distance then the sequence converges in distribution. The following lemma is 
elementary: 

Lemma A.3. Let X n and Y n be two sequences of random variables, all defined on a common prob- 
ability space, let a n be a sequence of scalars, and fix r > 0. Denote by F n and G n the distributions 
of X n and Y n , respectively. Then the equivalence 

(47) ^LZ^IL^f if and only if Xn ~ ttn ^F 
holds if either 

(48) (X n - Y n )/n r or 

(49) ||F n - G n \\ TV — »• 
«s n — > oo. Furthermore, ((49]) implies (@8j. 

The following lemma is an elementary consequence of Chebyshev's inequality and the 
definition of weak convergence. 

Lemma A.4. Let X n be a sequence of random variables. Suppose that for every e > there exist 
random variables X £ n and R £ n such that 



(50) X n = X e n + R 



X e n =>- Normal(0, a 2 £ ), and 
E\R £ n '\ 2 < e. 



Then lim e ^o cr 2 := a 2 > exists and is finite, and 

(51) X n ^Normal(0,a 2 ). 



SELF-INTERSECTIONS IN COMBINATORIAL TOPOLOGY: STATISTICAL STRUCTURE 



29 



References 

[1] Patrick Billingsley. Convergence of probability measures. John Wiley & Sons Inc., New York, 1968. 

[2] Patrick Billingsley. Convergence of probability measures. Wiley Series in Probability and Statistics: Prob- 
ability and Statistics. John Wiley & Sons Inc., New York, second edition, 1999. A Wiley-Interscience 
Publication. 

[3] Joan S. Birman and Caroline Series. An algorithm for simple curves on surfaces. /. London Math. Soc. (2), 
29(2):331-342, 1984. 

[4] Moira Chas. Combinatorial Lie bialgebras of curves on surfaces. Topology, 43(3):543-568, 2004. 
[5] Moira Chas and Anthony Phillips. Self-intersection numbers of curves on the doubly punctured sphere. 
[6] Moira Chas and Anthony Phillips. Self-intersection numbers of curves on the punctured torus. Experi- 
mental Mathematics, 2010. 

[7] Marshall Cohen and Martin Lustig. Paths of geodesies and geometric intersection numbers. I. In Com- 
binatorial group theory and topology (Alta, Utah, 1984), volume 111 of Ann. of Math. Stud., pages 479-500. 
Princeton Univ. Press, Princeton, NJ, 1987. 

[8] Manfred Denker and Gerhard Keller. On [/-statistics and v. Mises' statistics for weakly dependent pro- 
cesses. Z. Wahrsch. Verw. Gebiete, 64(4)505-522, 1983. 

[9] Wassily Hoeffding. A class of statistics with asymptotically normal distribution. Ann. Math. Statistics, 
19:293-325,1948. 

[10] Marius Iosifescu. On [/-statistics and von Mises statistics for a special class of Markov chains. /. Statist. 
Plann. Inference, 30(3):395^00, 1992. 

[11] Steven P. Lalley. Self-intersections of random geodesies on negatively curved surfaces. 

[12] Steven P. Lalley. Self-intersections of closed geodesies on a negatively curved surface: statistical regular- 
ities. In Convergence in ergodic theory and probability (Columbus, OH, 1993), volume 5 of Ohio State Univ. 
Math. Res. Inst. Publ, pages 263-272. de Gruyter, Berlin, 1996. 

[13] Maryam Mirzakhani. Growth of the number of simple closed geodesies on hyperbolic surfaces. Ann. of 
Math. (2), 168(1):97-125, 2008. 

Stony Brook University, Department of Mathematics, Stony Brook, NY, 11794. 

University of Chicago, Department of Statistics, 5734 University Avenue, Chicago IL 60637. 
E-mail address: moiraSmath . sunysb . edu, lalleySgalton . uchicago . edu 



